[UPDATE] Logical Sequence Analysis: IOCTL Communication Failures Triggering Global FDB Flushes

[UPDATE] Logical Sequence Analysis: IOCTL Communication Failures Triggering Global FDB Flushes

[UPDATE] Logical Sequence Analysis: IOCTL Communication Failures Triggering Global FDB Flushes
[UPDATE] Logical Sequence Analysis: IOCTL Communication Failures Triggering Global FDB Flushes
2026-03-04 17:54:16 - last edited 2 weeks ago
Tags: #Roaming_Logic
Model: Deco BE63  
Hardware Version: V2
Firmware Version: 1.2.10 Build 20251229 Rel. 42008

I am sharing an update based on further log analysis under different conditions.
 

Opportunity to optimize Loop Avoidance during Roaming/Steering
 

The qca_delete_wds_entry: failed to send ioctl errors, while present, do not appear to be the primary trigger for the FDB flushes. The router frequently handles these in the background using hardware acceleration (apsd_flush_fc_flow_by_ha_qca).
 

The FDB flush correlates directly with a temporary interface mapping loss during standard Band Steering or Roaming events. The sequence is as follows:


1. Transition: A client device moves between bands or nodes. daemon.err nrd[XXXXX]: wlanif_isSTAMoved: [MAC_ADDRESS] move from ath1 to athX.X
 

2. Interface Mapping Loss: Immediately following the move, the apsd daemon logs an error locating the new interface mapping. daemon.err /usr/bin/apsd: apsd_check_eth_has_neigh:1746: Error: can't find ifname[athX.X] in eth_ifname


3. FDB Flush: Following the mapping error, the Loop Avoidance mechanism executes a full FDB flush. daemon.err /usr/bin/apsd: loop_avoidance_flush_switch_fdb_table:1882: Error: set cmd:ssdk_sh fdb entry flush 1


Suggestion for R&D: Reviewing the apsd daemon's interface handover logic (apsd_check_eth_has_neigh) to prevent mapping loss during standard client roaming could stop the Loop Avoidance mechanism from triggering unnecessary flushes, preserving mesh and backhaul stability.

Best regards.
 

 

 

*** ORIGINAL TEXT ***

 

To the TP-Link Engineering and Product Team,

 

 

Background: Following a detailed analysis of the system logs on the Deco BE63, a recurring sequence has been identified that points to a specific optimization opportunity in the error-handling logic between the nrd and apsd processes.

 

Observed Technical Sequence:

 

  1. Transition Event (Roaming/Band Steering): A wireless client initiates a standard band transition (e.g., 2.4 GHz to 5 GHz). The system attempts to update the WDS table to reflect the change in the device's interface.

     

  2. Internal Communication Failure (IOCTL): The Network Resource Daemon (nrd) fails to complete the instruction to the hardware driver to delete the stale entry on the previous interfaces.

     

    • Technical Evidence: qca_delete_wds_entry: failed to send ioctl

       

  3. System Response (Global FDB Flush): Due to the inability to perform a surgical update on the MAC address table following the IOCTL failure, the system executes a global fail-safe measure, completely clearing the Forwarding Database (FDB).

     

    • Technical Evidence: ssdk_sh fdb entry flush 1

       

  4. Operational Impact: During the period the FDB table is empty, the system must relearn the location of all network devices via packet flooding. This results in perceptible latency and buffering pauses in continuous-stream applications, affecting even devices that were not involved in the initial roaming event.

 

Engineering Recommendation: It is recommended to review the error-handling sensitivity within the firmware. A failure to delete a specific WDS entry should not ideally result in a global FDB Flush. A more resilient error management approach—isolating the failure to the affected client without impacting the global forwarding database—would significantly enhance stability in high-density device environments.


Best regards,
Nelson Lora

  0      
  0      
#1
Options