Switch Suspension - Recurring CPU Issue

Switch Suspension - Recurring CPU Issue

Switch Suspension - Recurring CPU Issue
Switch Suspension - Recurring CPU Issue
2023-11-28 07:33:36
Model: TL-SG3428  
Hardware Version: V2
Firmware Version: 2.0.11

Hello everyone,

 

I would like to report another instance of suspension in our TL-SG3428 switch. The logs show the message: "43557 CPU RISING THRESHOLD: Total CPU Utilization is 90%". This is the second occurrence of such an issue within the last two months.

Importantly, I actively use the SNMP protocol on this router. I noticed a thread from 2021 on this forum discussing similar CPU RISING problems that might have been related to the use of SNMP. Now, in 2023, one would expect these types of issues to have been resolved.

I want to emphasize that SNMP is a critical tool for me, and I cannot imagine resolving this problem by disabling it. Has anyone else experienced similar issues or can suggest any solutions? Are there known cases where firmware updates have effectively solved this problem?

I appreciate any suggestions and assistance on this matter.

 

 

Firmware lastest: 2.0.11

 

Best regards,

KK

 

 

  0      
  0      
#1
Options
6 Reply
Re:Switch Suspension - Recurring CPU Issue
2023-11-29 02:22:00

Hi @fitful 

Thanks for posting in our business forum.

Unplug all of the devices on the switch, keep it in the idle situation, and use SNMP, will it rise to 90% CPU?

This is the most straightforward way to test it in your situation as you believe SNMP has a correlation to the CPU rise.

 

Usually, CPU usage is 90%, there might be a loop in your network and it is causing the storm. If you look into your log, do you notice anything like STP/RSTP involved?

 

Best Regards! If you are new to the forum, please read: Howto - A Guide to Use Forum Effectively. Read Before You Post. Look for a model? Search your model NOW Beta firmware got some NEW features! Subscribe for the latest update!Download Beta Here☚ ☛ ★ Configuration Guide ★ ☚ ☛ ★ Knowledge Base ★ ☚ ☛ ★ Troubleshooting Manual ★ ☚ (Disclaimer: Short links are used above solely for guidance to TP-Link subdomains and are safe and tracker-free. Exercise caution with short links from non-official members on forums. We are not liable for external content or damage from non-official members' link use.)
  0  
  0  
#2
Options
Re:Switch Suspension - Recurring CPU Issue
2023-11-29 07:29:07 - last edited 2023-11-29 07:35:36

  @Clive_A 

You might be right that I prematurely identified SNMP as the problem, influenced by another thread with a similar message.
In the logs, there's nothing except that the CPU usage starts to increase, reaching 85% and then 90%, and stays like that for a while. After some time, the network slows down (I'm using MAC-based authentication), and devices can't authenticate. Restarting resolves the issue until the next CPU usage spike.

I've currently disabled SNMP and will observe, but it might take a month or two to see if there's any change.

 

Nov 28 03:33:30 2023-11-28 03:33:30 TL-SG3428-XXXXX 43557 CPU RISING THRESHOLD: Total CPU Utilization is 85%.
Nov 28 03:36:30 2023-11-28 03:36:30 TL-SG3428-XXXXX 43557 CPU RISING THRESHOLD: Total CPU Utilization is 90%.
Nov 28 03:39:30 2023-11-28 03:39:30 TL-SG3428-XXXXX 43557 CPU RISING THRESHOLD: Total CPU Utilization is 90%.
Nov 28 03:42:31 2023-11-28 03:42:31 TL-SG3428-XXXXX 43557 CPU RISING THRESHOLD: Total CPU Utilization is 90%.
Nov 28 03:45:31 2023-11-28 03:45:31 TL-SG3428-XXXXX 43557 CPU RISING THRESHOLD: Total CPU Utilization is 90%.
Nov 28 03:48:32 2023-11-28 03:48:32 TL-SG3428-XXXXX 43557 CPU RISING THRESHOLD: Total CPU Utilization is 90%.
[...]
Nov 28 07:00:56 2023-11-28 07:00:57 TL-SG3428-XXXXX 43557 CPU RISING THRESHOLD: Total CPU Utilization is 90%.
Nov 28 07:03:57 2023-11-28 07:03:57 TL-SG3428-XXXXX 43557 CPU RISING THRESHOLD: Total CPU Utilization is 90%.
Nov 28 07:06:57 2023-11-28 07:06:57 TL-SG3428-XXXXX 43557 CPU RISING THRESHOLD: Total CPU Utilization is 90%.
Nov 28 07:07:08 2023-11-28 07:07:08 TL-SG3428-XXXXX 53449 Gi1/0/13 changed state to up.
Nov 28 07:07:11 2023-11-28 07:07:12 TL-SG3428-XXXXX 53450 Gi1/0/13 changed state to down.
Nov 28 07:09:58 2023-11-28 07:09:58 TL-SG3428-XXXXX 43557 CPU RISING THRESHOLD: Total CPU Utilization is 90%.
Nov 28 07:12:58 2023-11-28 07:12:58 TL-SG3428-XXXXX 43557 CPU RISING THRESHOLD: Total CPU Utilization is 90%.
Nov 28 07:15:58 2023-11-28 07:15:59 TL-SG3428-XXXXX 43557 CPU RISING THRESHOLD: Total CPU Utilization is 90%.

 

// OOOOO other device 
Nov 28 07:18:47 2023-11-28 07:18:48 TL-SG3428-OOOOO 53449 Gi1/0/4 changed state to up.
Nov 28 07:18:57 2023-11-28 07:18:58 TL-SG3428-OOOOO 60130 MAC authentication passed, port 4, MAC XX-XX-XX-XX-XX-XX, vid 1.

 

Nov 28 07:18:59 2023-11-28 07:18:59 TL-SG3428-XXXXX 43557 CPU RISING THRESHOLD: Total CPU Utilization is 90%.
Nov 28 07:21:59 2023-11-28 07:21:59 TL-SG3428-XXXXX 43557 CPU RISING THRESHOLD: Total CPU Utilization is 90%.
Nov 28 07:25:00 2023-11-28 07:25:00 TL-SG3428-XXXXX 43557 CPU RISING THRESHOLD: Total CPU Utilization is 90%.
Nov 28 07:28:00 2023-11-28 07:28:00 TL-SG3428-XXXXX 43557 CPU RISING THRESHOLD: Total CPU Utilization is 90%.
Nov 28 07:31:00 2023-11-28 07:31:00 TL-SG3428-XXXXX 43557 CPU RISING THRESHOLD: Total CPU Utilization is 90%.
Nov 28 07:34:01 2023-11-28 07:34:01 TL-SG3428-XXXXX 43557 CPU RISING THRESHOLD: Total CPU Utilization is 90%.
Nov 28 07:37:01 2023-11-28 07:37:01 TL-SG3428-XXXXX 43557 CPU RISING THRESHOLD: Total CPU Utilization is 90%.
Nov 28 07:40:01 2023-11-28 07:40:02 TL-SG3428-XXXXX 43557 CPU RISING THRESHOLD: Total CPU Utilization is 90%.
Nov 28 07:43:02 2023-11-28 07:43:02 TL-SG3428-XXXXX 43557 CPU RISING THRESHOLD: Total CPU Utilization is 90%.
Nov 28 07:46:02 2023-11-28 07:46:02 TL-SG3428-XXXXX 43557 CPU RISING THRESHOLD: Total CPU Utilization is 90%.
Nov 28 07:49:02 2023-11-28 07:49:03 TL-SG3428-XXXXX 43557 CPU RISING THRESHOLD: Total CPU Utilization is 90%.
Nov 28 07:52:03 2023-11-28 07:52:03 TL-SG3428-XXXXX 43557 CPU RISING THRESHOLD: Total CPU Utilization is 90%.
Nov 28 07:55:03 2023-11-28 07:55:03 TL-SG3428-XXXXX 43557 CPU RISING THRESHOLD: Total CPU Utilization is 90%.
Nov 28 07:58:04 2023-11-28 07:58:04 TL-SG3428-XXXXX 43557 CPU RISING THRESHOLD: Total CPU Utilization is 90%.
Nov 28 08:01:04 2023-11-28 08:01:04 TL-SG3428-XXXXX 43557 CPU RISING THRESHOLD: Total CPU Utilization is 90%.
Nov 28 08:04:04 2023-11-28 08:04:05 TL-SG3428-XXXXX 43557 CPU RISING THRESHOLD: Total CPU Utilization is 90%.
Nov 28 08:07:05 2023-11-28 08:07:05 TL-SG3428-XXXXX 43557 CPU RISING THRESHOLD: Total CPU Utilization is 90%.

 

// reboot device XXXXX

Nov 28 08:09:30 2023-11-28 08:09:30 TL-SG3428-XXXXX 53449 Gi1/0/24 changed state to up.
Nov 28 08:09:31 2023-11-28 08:09:31 TL-SG3428-XXXXX 53449 Gi1/0/12 changed state to up.
Nov 28 08:09:44 2023-11-28 08:09:44 TL-SG3428-XXXXX 60130 MAC authentication passed, port 3, MAC XX-XX-XX-XX-XX-XX, vid 11.
Nov 28 08:09:45 2023-11-28 08:09:45 TL-SG3428-XXXXX 60130 MAC authentication passed, port 4, MAC XX-XX-XX-XX-XX-XX, vid 11.
Nov 28 08:09:46 2023-11-28 08:09:46 TL-SG3428-XXXXX 60130 MAC authentication passed, port 5, MAC XX-XX-XX-XX-XX-XX, vid 11.
Nov 28 08:09:50 2023-11-28 08:09:50 TL-SG3428-XXXXX 60130 MAC authentication passed, port 10, MAC XX-XX-XX-XX-XX-XX, vid 11.
Nov 28 08:09:51 2023-11-28 08:09:51 TL-SG3428-XXXXX 60130 MAC authentication passed, port 11, MAC XX-XX-XX-XX-XX-XX, vid 11.
 

STP is intentionally disabled in our network because it's not needed at the switch level. However, we have enabled loopback detection, configured as port-based for access ports and with alerts on the uplink port. Despite this precaution, there is no indication in the logs of a loopback detection trap being triggered

 

  0  
  0  
#3
Options
Re:Switch Suspension - Recurring CPU Issue
2023-11-29 08:25:20

Hi @fitful 

Thanks for posting in our business forum.

You can turn on the STP and monitor this as well. STP would not take too many resources. With the STP, let's see if the CPU spikes again or not.

Best Regards! If you are new to the forum, please read: Howto - A Guide to Use Forum Effectively. Read Before You Post. Look for a model? Search your model NOW Beta firmware got some NEW features! Subscribe for the latest update!Download Beta Here☚ ☛ ★ Configuration Guide ★ ☚ ☛ ★ Knowledge Base ★ ☚ ☛ ★ Troubleshooting Manual ★ ☚ (Disclaimer: Short links are used above solely for guidance to TP-Link subdomains and are safe and tracker-free. Exercise caution with short links from non-official members on forums. We are not liable for external content or damage from non-official members' link use.)
  0  
  0  
#4
Options
Re:Switch Suspension - Recurring CPU Issue
2023-11-29 22:21:13 - last edited 2023-11-29 22:23:52

@fitful I actually experienced the same problem a few weeks ago, but a different switch model as it happened on my TL-SG3428X v1.0 switch.  @Clive_A you may recall the thread I have open (there are still issues, but this was a big part).  I tried reboots, disabling RSTP/STP/Loopback/Bonds, factory reset.  CPU was constantly pegged at 90%.  I ended up moving every single device to other switches and yet still pegged at 90%  I thought perhaps it is a bug with Omada receiving or interpreting the performance statistics incorrectly and so I factory reset and logged in while it was in standalone mode.  Still 90%.

 

What was left was my SFP uplinks and that ended up being the problem.  It might be an issue with the Jetstream or the specific SFP adapter, but I found that the CPU would jump dramatically whenever they were plugged in without being active.  Basically taking them all out would resolve the CPU issue.  Adding in a disconnected SFP (i.e. Fibre not connected or even having the port down in Omada) would drive the CPU up to nearly 50% and a 2nd one, 90%.  I found even with media fully connected and little traffic, it still causes some aggressive CPU usage. I notice about 15% CPU additional usage from the SFP module on my two TL-SG3210XHP-M2 v1.0 switches which are downstream and only have a single SFP connected at this point.

 

This may not be the source of your problem, but you never know.

  0  
  0  
#5
Options
Re:Switch Suspension - Recurring CPU Issue
2023-11-30 01:10:41

Hi @seanvan 

Thanks for posting in our business forum.

seanvan wrote

@fitful I actually experienced the same problem a few weeks ago, but a different switch model as it happened on my TL-SG3428X v1.0 switch.  @Clive_A you may recall the thread I have open (there are still issues, but this was a big part).  I tried reboots, disabling RSTP/STP/Loopback/Bonds, factory reset.  CPU was constantly pegged at 90%.  I ended up moving every single device to other switches and yet still pegged at 90%  I thought perhaps it is a bug with Omada receiving or interpreting the performance statistics incorrectly and so I factory reset and logged in while it was in standalone mode.  Still 90%.

 

What was left was my SFP uplinks and that ended up being the problem.  It might be an issue with the Jetstream or the specific SFP adapter, but I found that the CPU would jump dramatically whenever they were plugged in without being active.  Basically taking them all out would resolve the CPU issue.  Adding in a disconnected SFP (i.e. Fibre not connected or even having the port down in Omada) would drive the CPU up to nearly 50% and a 2nd one, 90%.  I found even with media fully connected and little traffic, it still causes some aggressive CPU usage. I notice about 15% CPU additional usage from the SFP module on my two TL-SG3210XHP-M2 v1.0 switches which are downstream and only have a single SFP connected at this point.

 

This may not be the source of your problem, but you never know.

With the SFP module plugged in, is there any fiber link to the module? If not, the CPU would remain at a higher level than normal(no fiber and module).

 

When the module is plugged in but at idle(no fiber is connected to it), the CPU usage would keep a high level. It is normal due to the chipset design.

If you don't use the SFP port, you can unplug the module from the switch to avoid high CPU usage. When the fiber and module are used at the same time on the switch, the CPU usage would fall back to a normal level.

 

But your post earlier was about the Rootnode stuff. Did not mention any SFP.

Best Regards! If you are new to the forum, please read: Howto - A Guide to Use Forum Effectively. Read Before You Post. Look for a model? Search your model NOW Beta firmware got some NEW features! Subscribe for the latest update!Download Beta Here☚ ☛ ★ Configuration Guide ★ ☚ ☛ ★ Knowledge Base ★ ☚ ☛ ★ Troubleshooting Manual ★ ☚ (Disclaimer: Short links are used above solely for guidance to TP-Link subdomains and are safe and tracker-free. Exercise caution with short links from non-official members on forums. We are not liable for external content or damage from non-official members' link use.)
  0  
  0  
#6
Options
Re:Switch Suspension - Recurring CPU Issue
2023-12-18 20:24:21

  @Clive_A my last post was about testing what I discovered, I do realize that CPU usage would increase when the SFP module has no media conected, but the results that I experience were significantly higher than one would expect.  Simply having 2 SFP modules connected with media, but the far end remains disconnected would be enough to make the switch inoperable.

 

My post was about performance issues and I oringally thought it was due to STP/RSTP convergence issues as this was the only message being logged in the Omada controller.  I then discovered multiple issues including this one inolving SFP modules.  I certainly agree that when in production, one should either have the SFP and media in use or fully removed.  However, I have yet to fully get a stable network with redundant links, therefore while attempting to build out the network, it is common to leave cables temporarily disconnected.  Sadly this contributes to significant load and can take down the network which is frustrating to say the least.

 

I really wish we had the ability to use MLAG and/or MSTP as the switches are capable in standalone mode.  Is there any discussions about whether TP Link will bring these to controller mode at some point?  What is TP Link recommending to customers who wish to achieve active/active loadbalancing and high availability?  Would you recommend going to standlone mode for the switches and use the Controller for the access points?  @fitful sorry for hijacking your thread, I can move the discussion with Clive elsewhere.  I hope you have had luck in sorting out your high CPU issue?

 

Thanks,

Sean

  0  
  0  
#7
Options

Information

Helpful: 0

Views: 271

Replies: 6

Related Articles