TL-R470T+ CPU 100% broken download and device unresponsive

TL-R470T+ CPU 100% broken download and device unresponsive
TL-R470T+ CPU 100% broken download and device unresponsive
2022-06-20 22:24:36 - last edited 2022-06-23 22:55:33
Model: TL-R470T+  
Hardware Version: V6
Firmware Version: 6.0.4 Build 20200313 Rel.32850

Hi,

I just wanted to share how do I find the workaround for the issue of CPU at 100% in my setup.

 

What is my current architecture?

 

* Connections: ISP #1/#2 <-> TL-R470T+ <-> Deco M9 Plus <-> WLAN


* ISP #1: Telekom VSDL 100/40 Mbps from a SpeedPort Smart 3 router.
* ISP #2: Vodafone VSDL over LTE 4G 100/40 Mbps from a Gigacube CAT9 router.


* Internet Failover served from TL-R470T+, relevant settings:
  * Hardware: V6
  * Firmware: 6.0.4 Build 20200313 Rel.32850
  * Port 1 - WAN1: ISP #1, Static IP, Speed: 90d/35u Mbps.
  * Port 2 - WAN2: ISP #2, Static IP, Speed: 90d/35u Mbps.

  * Port 5 - LAN: Deco M9 Plus LAN, DHCP: Enabled.

  * Network:

    * Switch > Flow Control > Enabled (All Ports).
    * IPTV > IGMP Snoop. v3 & IGMP Proxy > Enabled

  * Transmission: 
    * Loadbalancing: Enabled.

       * Online Detection:

       * PING: 8.8.8.8 (Avoid the ISP router, which can delay ICMP/PING replies when stressed)

       * DNS: ISP Router IP (Avoid Google, but ensure Internet DNS resolution is working)
    * Routing > Policy Routing > Rule:
      1, ALL, IPGROUP_ANY, IPGROUP_ANY, WAN1, ANY, "", Priority, Enabled.
    * Bandwidth Control: Enabled.
  * Services > UPnP > Enabled (To use qBittorrent).
  * System Tools > SNMP > Enabled (with defaults).


* WiFi Dual Band served from a mesh of Deco M9 Plus, relevant settings:
  * Hardware: V2
  * Firmware: 1.5.6 Build 20211018 Rel. 35617
  * 3x WiFi Router Mesh.
  * QoS > Custom > Low: Gaming, Medium: Streaming/Downloading, High: Surfing/Chat. 
  * QoS > Internet Bandwidth > 90d/35u Mbps
  * Beamforming: Disabled (To avoid MacBook random disconnections to the WiFi).

 

* PC Windows, revelant settings (To avoid random disconnectios to the WiFi):
   Intel Wireless Card: "Recommended Settings for 802.11ac Connectivity" at Intel site Article ID: "000024678"

 

What was the problem?
* Any file download speed decreases until 0bps and failed due to network error.
* The exact moment of the failure is when TP-Link R470T+ CPU hits 100% (Web GUI unresponsive) or HW stucks.
* Testing speed using Netflix speedtest causes the same CPU issue, also kills any file download in-progress.

 

What was the culprit?
* Basically the Deco M9 Plus have 1 Gbps interface and was flooding the 100 Mbps port of the TP-Link R470T+.

* The TP-Link R470T+ was doing WAN failovers due to online detection failing (PING to one of the ISP routers - Telekom).
* The problem does not happen if the Deco M9 Plus is connected directly to any of the ISP routers.

 

What other changes I tested but didn't work? (One by One).
* Add rules to control the bandwidth in the TP-Link R470T+ (situation get even worse).
* Enable firewall attack defense for flood and/or anomaly defense (situation get even worse, just out of curiosity).
* Lower the WAN MTU to 1492 due to packet fragmentation (Seen on SNMP stats for ISP #1; Youtube > How TCP really works: MTU vs MSS (David Bombal)).
* Disable load balancing and policy routing, or use link backup schema.
* Rely only on one ISP connection (Vodafone was better than Telekom).
* Toggle bandwidth control.

* Toggle flow control.

 

What is the workaround?
* Prevented to send traffic to any TP-Link R470+ 100Mbps port that surpasses 90% of its capacity.

* Prevented failovers when CPU reaching 100% adjusting the Online Detection.

 

What is the situation now?
* Download/Upload speeds reach the my target speed of 90d/35u Mbps on file download or Netflix 

* I tested an extreme situation and CPU was always between 20-60% while concurrently (all together):
  * 1x smart TVs play youtube at 2160p quality 4K videos.
  * 1x android phones playing netflix at regular speed.
  * 1x Windows laptop downloaded at 10GiB file from any public source.
  * 1x Windows laptop downloaded via qBittorrent an image from Ubuntu website.
  * 1x MacBook laptop surfing and looking at the TP-Link R470+ (System Status > Resource Utilization)
  * 1x LibreNMS monitoring server polling via SNMP every 1-5 minutes the TP-Link R470+.
  * The test was running for 1-2 hours.

 

Why I wrote this?
 

Because this post maybe interesting for the people participating in the following unresolved but closed forum threads:

* TL-R480T+ 100% CPU (Topic: 155679)
   https://community.tp-link.com/en/business/forum/topic/155679
* TL-R470T+ at 100% (Topic: 221484)
   https://community.tp-link.com/en/business/forum/topic/221484

 

Best,

Olaf

TP-Link User Since 2008
4
4
#1
Options

Information

Helpful: 4

Views: 112

Replies: 0

Related Articles