New EAP245 HW.4 (Canada) - Bad firmware or possibly a bad batch of these Units
This is my 3rd EAP245, but this one is HW. version 4. My other two are HW version 3.
New EAP245 Firmware is: 1.2.1 Build 20230824 Rel. 61490
I have the Software controller deployed in the network, Controller version 5.13.22.
Check the screenshots out
The controller has no issues adopting and provisioning the AP on the network.
- The Status shows that it's channels are Busy, no TX, RX reporting and the 'Tools' menu is missing.
- CPU usage is eradicate, inconsistent and higher then it should be, especially for not having any clients connected to it.
Compared to one of my other EAP245s, which have a number of connected clients.
I've done a factory reset twice soi far and even tried using it as an individual AP, not part of the controller. Still the CPU is oddly high and wireless shows Busy when it's in AP mode.
Bad firmware?
- Copy Link
- Subscribe
- Bookmark
- Report Inappropriate Content
Hey there, sorry for the delay in getting back to you.
Yeah, it seems that the stock firmware the EAP245 v4 (CA) modal ships with has a major bug in it, dealing with a kernel module. IE, the one I identified, brought to their attention and ended up working with them for over a month + on..
Anyways, long story short; They provided a couple of different beta firmware’s to test out. The first one corrected the kernel issue BUT all my IoT devices had major issues, including tons of dropped packets and dropped clients. After explaining this to them, they eventually got back to me to test out another BETA firmware. I've been running this specific firmware now for over two + weeks and glad to say that it has corrected the majority of issues I brought to their attention.
I'm not sure how they run their QA department, but I'm really surprised that they shipped firmware that had a NOTICABLE issue on these EAP245 v4 models. This is a big NO NO, especially when they advertise the Omada line products being Business / Consumer level.
They haven't updated the firmware on their download page yet for the EAP245 v4 (CA) modal, I don't think there should be any harm in me sharing that with you.
Let me know if you have any questions at all.
Cheers,
- Copy Link
- Report Inappropriate Content
Or do you think I received a lemon?
- Copy Link
- Report Inappropriate Content
Hi @iWebAdmin,
iWebAdmin wrote
- The Status shows that it's channels are Busy, no TX, RX reporting and the 'Tools' menu is missing.
Due to chip differences, the channel utilization on EAP225 v4 cannot distinguish Tx/Rx frame, so Tx/Rx and Interference are uniformly represented by Busy.
- CPU usage is eradicate, inconsistent and higher then it should be, especially for not having any clients connected to it.
More and more new features added for the EAP's firmware version, there are some CPU usage occupied on when no clients connecting is normal.
Please check whether the actual wireless environment is effected and whether the clients had disconnected issue.
- Copy Link
- Report Inappropriate Content
@Hank21
I've been in IT for over 23 years now and have worked with many networking devices. This is not normal because some additional 'features' you say keep being added. If anything, as I explained in my first post, I'm MISSING features in the HW4. model compared to my HW3 models.
I keep it a cool 65F - 67F in the house during the winter. The bottom of this new EAP is hot. A little too hot for not having any clients connected to it, but not enough to spontaneously burst into flames (Compared to the other HW3 models). Which includes the constant CPU utilization, which fluctuates between 18 - 25%. Had one of my TV connected to it to test the CPU utilization while Streaming Netflix, HDR movie, and it gets up to 28 - 32% with one streaming device!
The other two EAP245 HW.3 I have average between 3% - 7% usage, and are barely warm to touch on the bottom. I have about 50+ devices connected.
I've opened a support case already on this, in which I've already used TFTP to test a couple different firmware’s to see if that would correct the issue(s), but it hasn't. There is definitely an underlying issue, and I have a feeling I got a bad apple here..
Screenshot of one of the EAP245s HW3, with about 20 - 25 clients on it over a 24 hour period:
Screenshot of the new EAP245 HW4. 0 clients (Keeping clients off it) over an ~12 hour period. (Notice I'm missing the Tools option?)
- Copy Link
- Report Inappropriate Content
Hi @iWebAdmin,
We have noticed you opened a Ticket in the Support system, the senior engineer has received your feedback on the issue via email, they will continue to follow up with your case. If you have any additional information, please feel free to reply to the support email whose case ID is TKID231222710. Many thanks for your cooperation and patience!
Once the issue is addressed or resolved, I'd encourage you to share it with the community.
- Copy Link
- Report Inappropriate Content
Hey @Hank21
Thanks for the reply.
Just an update on my end so far, as support appears not to know what is going on here as of yet.
So I received my replacement from Amazon for the EAP245 v4. And to my surprise, it has the exact same issues as the first one I received.
1.) CPU is at a CONSTANT 20 - 25% utilization on the v4 of these EAP245s and randomly spikes to 50 - 60 with zero clients on it.
This EAP runs hotter than the other two EAP v3. The bottom of the AP is actually HOT. Ambient temp in the house here is 62F, and the EAP v4. is still HOT. This is my main concern here, as I can't imagine how hot it will get in the Summer..
2.) The 2.4 GHz band with NO clients is always between 25% - 40% busy.. Compared to the other EAP245 v3, it gives way more accurate readings.
3.) Missing options for the v4. device. There is NO tools tab unlike the EAP245 v.3.
So this being my second EAP245 .v4 unit with the exact same results from the first, we either have an underlying issue here with the current firmware, OR there is a bad batch of these EAP245 .v4 devices.
So my question to support; If these devices have SSH capability, which I think they do in Standalone AP mode? Then we should be able to identify what process is eating up 20% + of the CPU when idle with no clients.
If anyone else is experiencing the same issues here with the EAP245 v4 (Canada), please let me know.
- Copy Link
- Report Inappropriate Content
I SSH'ed into the EAP245v3 and also the EAP245v4 and one of the first things I noticed about the v4 model is that its CPU load is a constant 1.0! Compared to the EAP245v3 model, which sits at about CPU load 0.04.
The v4 modal is dual-core (ARMv7), so this basically equates to about 50% of the actual CPU being used. Reason why we don't see that in TOP is because it could be related to an I/O bottleneck, leading back to the constant 10% that the "kworker" process is running at. Kernel in this model is Linux EAP245 4.4.198.
Compared to the EAP245v3, which the 'kworker' process will only pop up intermittently, hitting about 5 - 7% CPU, then back down. To note, this modal CPU is Qualcomm Atheros QCA956X, 1 core. Kernel version in this model is running Linux EAP245 3.3.8.
So the kworker is what appears to be the culprit here.
I tried accessing the klogs and also other logs but unfortunately, the admin user you log in with does NOT have root permissions to R/W. Is there a way to gain root access here? I tried a few commands with busybox and login, but no go. Password error.
DMESG Errors for the v4 model, and I found an unsupported function (A Kernel module..) that is looping!
[63147.472752] AsicSetSyncModeAndEnable(): NotSupportedFunc for this arch(HIF_MT)!
[63147.866626] Max Bss Table is 256, MAX_LEN is : 256
[63147.872820] ScanTab num is 7
[63147.877088] Total Bss num: 7
[63157.238962] *****[1]Set_SiteSurvey_Proc(): try to get lock
[63157.245395] *****[2]Set_SiteSurvey_Proc(): get lock
[63157.252622] SYNC - sync_fsm_scan_req_action:[1308] LAST_CH: 0, BAND: 1
[63157.259535] AsicDisableSync(): NotSupportedFunc for this arch(HIF_MT)!
[63157.267523] AsicSwitchChannel(): 5G Channel:36, then must be Channel_Band:1 !!
[63157.288454] ExtEventBeaconLostHandler::FW LOG, Beacon lost (**:**:31:b1:**:**), Reason 0x10
[63157.296842] Beacon lost - AP disabled!!!
[63157.302576] ExtEventBeaconLostHandler::FW LOG, Beacon lost (**:**:31:b1:**:**), Reason 0x10
[63157.310958] Beacon lost - AP disabled!!!
[63157.315216] ExtEventBeaconLostHandler::FW LOG, Beacon lost (**:**:31:b1:**:**), Reason 0x10
[63157.323583] Beacon lost - AP disabled!!!
[63157.330626] ExtEventBeaconLostHandler::FW LOG, Beacon lost (**:e9:**:**:**:15), Reason 0x10
[63157.339001] Beacon lost - AP disabled!!!
[63157.345820] ExtEventBeaconLostHandler::FW LOG, Beacon lost (**:**:**:b2:**:15), Reason 0x10
[63157.354195] Beacon lost - AP disabled!!!
...
...
[63158.674090] AsicSetSyncModeAndEnable(): NotSupportedFunc for this arch(HIF_MT)!
[63158.681446] *****[3]scan_ch_restore(): try to release lock
[63158.686951] *****[4]scan_ch_restore(): released lock
[63158.692023] AsicSetSyncModeAndEnable(): NotSupportedFunc for this arch(HIF_MT)!
...
...
[63206.204218] asic_txbf_bfee_adaption(): NotSupportedFunc for this arch(HIF_MT)!
[63206.211641] [PMF]APPMFInit:: apidx=8, MFPC=0, MFPR=0, SHA256=0
[63206.217607] wifi_sys_linkdown(), wdev idx = 8
[63206.222039] ExtEventBeaconLostHandler::FW LOG, Beacon lost (**:e9:**:b1:**:15), Reason 0x10
[63206.222042] Beacon lost - AP disabled!!!
[63206.234695] wifi_sys_linkdown(8): get wf_link_lock semaphore success.
It's found all throughout the DMESG output, which most likely explains the high CPU usage of the kworker process.
And the AP dropping clients..
So from observation, you have a kernel module misconfiguration here causing the issue.. Good thing is that this can be resolved with a firmware upgrade.
To note as well, I did a grep for "NotSupportedFunc" on the DMESG for the EAP245v3 model, and I got zero results back.
/proc $ dmesg |grep NotSupportedFunc
/proc $
- Copy Link
- Report Inappropriate Content
Hi @iWebAdmin,
Thanks for your updates!
Our senior engineer has received your feedback on the issue via email, they will continue to follow up with your case. If you have any additional information, please feel free to reply to the support email whose case ID is TKID231222710. Many thanks for your cooperation and patience!
Thank you for your valued update on the case.
- Copy Link
- Report Inappropriate Content
Hi @Hank21
I would love to further assist in finding the root cause (Which I’m 98% sure I have already) and a solution for this issue. Unfortunately, my hands are tied since I can’t look at 4/5s of the logs since they are owned by root, and the system img is also R/O.
If I had root access and firmware that was system R/W, then I could check other logs here and install 'lsof', which will allow me to look at all dependencies that those two kworker processes are using, pointing me to the specific kernel module. I can guarantee you it would only take me a couple of days tops to figure out the root cause and the solution.
I'm ranting here now, as the reply I received last night from support, seems that they are trying to justify the HEAT output as normal but obvious evidence I have found with the kworker processes and errors looking in the DMESG about specific kernel module(s) not supporting functions.
EAP245 v3 | EAP245 v4 |
---|---|
|
Here was the reply from support last night:
"EAP245 V3 and V4 use different chip solutions. We will try to reproduce it locally to see if the same problem occurs. Regarding the issue of heating, the overall power consumption of EAP245V4 is higher than that of EAP245V3. Under normal circumstances, the bottom case temperature of V4 is higher than that of V3. No judgment can be made based on this description. And judging from the description, the new EAP245 v4 has the same problem, and it doesn't seem like a prototype hardware abnormality."
Here is part of my response of which I already included some of the above info in it:
"I'm aware of the hardware and software changes between both EAP245 v3 and v4..
Okay, so you guys haven't had a chance yet to try and replicate this. Please do sooner then later since I have to pay and ship these units back before Jan. 20th if we don't get this resolved.
The two serial numbers for both these devices looks like they could be from the same batch;
- EAP245 v4 #1 -- Serial 2*****5000493
- EAP245 v4 #2 -- Serial 2*****5000584
So you guys have yet to have a chance to try and replicate this? What's the hold up? I seem to be just wasting my money here now. For the unit you will be testing, look at the serial number and check if they are in the range from the serials I listed above. I'm not sure how large a single 'batch' of units is manufactured unfortunately.
Did you mentioned that these units are prototypes?
You guys need to look at the 'kworker' processes. Because a dual-core CPU running a load of 1.0 + at idle, is NOT normal...
And your saying that this type of heat is not an issue and is NORMAL? That is a constant 50% of its power. In-turn, pumps up the heat in the unit. You do realize this kind of heat on the CPU diode can deteriorate and greatly shorten the lifespan of it? If the plastic on the bottom of the unit is 40 C +, imagine what the temp is on the CPU diode. And I'm not going to attempt taking the unit apart to check it since one of them needs to be sent back before Jan. 15th.. Both my EAP245 v3. units have a measured temperature of half of the v4. units. 20 C MAX, bottom plastic. So this is DOUBLE.
Let check some of the main issues surrounding the "kworker' process and how it directly affects CPUs, see here:
kworker high CPU and other issues
The fact that I received two that are experiencing the same issue here, I don't see this as much of a coincidence since both show that the 'kworker' processes are running high, and evidence from DMESG pointing to the kernel module errors being thrown, which is actually looping.
Lets do this so I'm not wasting any more money. Once you try and replicate this on your end, and if there are NO issues that you see your test EAP245 v4 unit. Then you can send me that specific UNIT you tested with, as I don't want to take a change anymore going through Amazon."
Anyways, here is some additional info on the specific kworker processes that seems to be the issue:
EAP245 v3 | EAP245 v4 |
---|---|
/bin $ ps |grep kworker 7 0 [kworker/u:1] 549 0 [kworker/u:2] 1322 0 [kworker/0:0] 24777 0 [kworker/0:2] 26142 0 [kworker/0:1] |
/bin $ ps |grep kworker And only seems to be these two. |
- Copy Link
- Report Inappropriate Content
I am seeing the same thing on my end. All of my v3 access points show minimal cpu usage while my v4 access point I purchased from Amazon is showing 20% base usage on the CPU.
Also I noticed a lot of informational messages being sent from my v4 access point to my syslog server. Not sure if this is the cause of some of the issues with CPU usage. I couldnt find a way to turn off the messages.
Hopefully TPlink addresses these issues soon with a firmware update.
- Copy Link
- Report Inappropriate Content
Hey there, sorry for the delay in getting back to you.
Yeah, it seems that the stock firmware the EAP245 v4 (CA) modal ships with has a major bug in it, dealing with a kernel module. IE, the one I identified, brought to their attention and ended up working with them for over a month + on..
Anyways, long story short; They provided a couple of different beta firmware’s to test out. The first one corrected the kernel issue BUT all my IoT devices had major issues, including tons of dropped packets and dropped clients. After explaining this to them, they eventually got back to me to test out another BETA firmware. I've been running this specific firmware now for over two + weeks and glad to say that it has corrected the majority of issues I brought to their attention.
I'm not sure how they run their QA department, but I'm really surprised that they shipped firmware that had a NOTICABLE issue on these EAP245 v4 models. This is a big NO NO, especially when they advertise the Omada line products being Business / Consumer level.
They haven't updated the firmware on their download page yet for the EAP245 v4 (CA) modal, I don't think there should be any harm in me sharing that with you.
Let me know if you have any questions at all.
Cheers,
- Copy Link
- Report Inappropriate Content
Information
Helpful: 0
Views: 1092
Replies: 10
Voters 0
No one has voted for it yet.