all clients disconnect and reconnect across network of multiple TP Link devices

This thread has been locked for further replies. You can start a new thread to share your ideas or ask questions.
12

all clients disconnect and reconnect across network of multiple TP Link devices

This thread has been locked for further replies. You can start a new thread to share your ideas or ask questions.
all clients disconnect and reconnect across network of multiple TP Link devices
all clients disconnect and reconnect across network of multiple TP Link devices
2023-03-14 14:47:46
Model: TL-SG3428  
Hardware Version: V2
Firmware Version: 2.0.10

Every client connected to my network is disconnecting and reconnecting from the network and I am looking for some help to debug whether this is a product bug or a misconfiguration.

 

I've updated every switch and WAP firmware and rebooted all of them, but the problem persists.

 

Any advice on what to do?

 

 

Not shown in the diagram, I have a 2 port LAG between brain and switch-core and a 2 port LAG between switch-core and switch-poe on the fibre ports.

  0      
  0      
#1
Options
13 Reply
Re:all clients disconnect and reconnect across network of multiple TP Link devices
2023-03-14 15:57:01 - last edited 2023-03-14 15:57:55

I have tried the additional things:

 

  • unplugged switch-core for 5m
  • disconnected one of the LAG links between switch-core and switch-poe for a few minutes
  • disconnected switch-basement from the network for a few minutes

 

I still encounter constant disconnect/reconnect of clients (wired or wireless) regardless of these changes.

  0  
  0  
#2
Options
Re:all clients disconnect and reconnect across network of multiple TP Link devices
2023-03-14 20:50:31

  @BubbaBubba 

 

  • disconnected one link of LAG between brain and switch-core to no effect
  0  
  0  
#3
Options
Re:all clients disconnect and reconnect across network of multiple TP Link devices
2023-03-15 05:52:31

Hi @BubbaBubba 

 

Have you enabled Loop Detection on your core switch?(like the following screenshot)

 

On the Log, is there any thing like "xxxx port x has been blocked"?

 

I feel like there is a loop on your network, so some ports are blocked by the switch automatically.

But to figure out where is the loop, you may need to check it physically.

Best Regards! >> Omada EAP Firmware Trial Available Here << >> Get the Latest Omada SDN Controller Releases Here << *Try filtering posts on each forum by Label of [Early Access]*
  0  
  0  
#4
Options
Re:all clients disconnect and reconnect across network of multiple TP Link devices
2023-03-15 13:35:26

Hank21 wrote

Hi @BubbaBubba 

 

Have you enabled Loop Detection on your core switch?(like the following screenshot)

 

On the Log, is there any thing like "xxxx port x has been blocked"?

 

I feel like there is a loop on your network, so some ports are blocked by the switch automatically.

But to figure out where is the loop, you may need to check it physically.

 

@Hank21 

 

It apears that I had already enabled Loop Detection on my core switch. I assume that the output would be in the logs if any loops were detected.

 

When I search for "blocked" in the logs I do not get any hits.

 

I will take your advice and physically disconnect all core switch ports, and assuming there are no problems, add them back one-by-one.

  0  
  0  
#5
Options
Re:all clients disconnect and reconnect across network of multiple TP Link devices
2023-03-16 00:33:20

  @BubbaBubba 

 

My problem persists when only brain is connected to switch-core and I've physically removed all other links. The problem persists regardless if I have either physical link of the LAG up or both LAG ports.

 

So, there's something between the switch-core and my Fedora-based router/firewall/dns/dhcp server.

 

Here's my routing table on brain:

 

root@brain ~
  # route -n                                                                                                                                                           !10003
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         96.234.217.1    0.0.0.0         UG    100    0        0 enp15s0
10.10.20.0      0.0.0.0         255.255.255.0   U     0      0        0 bond0.20
10.10.30.0      0.0.0.0         255.255.255.0   U     0      0        0 bond0.30
10.10.40.0      0.0.0.0         255.255.255.0   U     0      0        0 bond0.40
96.234.217.0    0.0.0.0         255.255.255.0   U     100    0        0 enp15s0
169.254.0.0     0.0.0.0         255.255.0.0     U     1009   0        0 bond0.20
169.254.0.0     0.0.0.0         255.255.0.0     U     1010   0        0 bond0.30
169.254.0.0     0.0.0.0         255.255.0.0     U     1011   0        0 bond0.40
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
192.168.0.0     192.168.1.1     255.255.0.0     UG    425    0        0 br0
192.168.1.0     0.0.0.0         255.255.255.0   U     425    0        0 br0
193.1.0.0       192.168.1.1     255.255.0.0     UG    425    0        0 br0

 

I'll continute debugging tommorrow when I'm refreshed.

  0  
  0  
#6
Options
Re:all clients disconnect and reconnect across network of multiple TP Link devices
2023-03-16 03:51:15

Hi @BubbaBubba 

 

Do you have anything like Sonos Speaker in your network?

 

If you disable LAG/LACP, and only leave a normal Ethernet connection between your switches, will there be an improvement?

 

On your screenshot I can see many records are related to "brain(VLAN20 gateway)". If possible, could you please share us your VLAN settings on "brain"?

 

The switch uplink port(to the brain) has profile "All", correct?

 

 

Best Regards! >> Omada EAP Firmware Trial Available Here << >> Get the Latest Omada SDN Controller Releases Here << *Try filtering posts on each forum by Label of [Early Access]*
  0  
  0  
#7
Options
Re:all clients disconnect and reconnect across network of multiple TP Link devices
2023-03-16 05:29:43

Hank21 wrote

Hi @BubbaBubba 

 

Do you have anything like Sonos Speaker in your network?

 

If you disable LAG/LACP, and only leave a normal Ethernet connection between your switches, will there be an improvement?

 

On your screenshot I can see many records are related to "brain(VLAN20 gateway)". If possible, could you please share us your VLAN settings on "brain"?

 

The switch uplink port(to the brain) has profile "All", correct?

 

 

  @Hank21 

 

I do not have Sonorr speakers, but I do have a lot of devices and other networks like ZWave and Zigbee. I initially thought the Zigbee network could be an issue because it's relatively new and IIRC I've read that it can interfere with 2.4 wireless. The first thing I did was to disconnect the Zigbee controller (Home Assistant Skyconnect), but it didn't make a difference.

 

I did not disable LAG/LACP, but after physically disconnecting all Ethernet cables from switch-core, I tried disconnecting one of the LAG Ethernet cables, waited 20m, still saw the disconnect/reconnects, reconnected the cable and then disconnected the other Ethernet cable for the second LAG port/link, but I still observed the problem. Tomorrow I will completely disable the LAG/LACP and go to a normal Ethernet connection and see how that goes.

 

The switch uplink LAG port does have Profile "all".

 

For the VLANs on brain, I'm not exactly sure what info is the most relevant. I'd be happy to share more info. I just brought the link down on all of the VLAN interfaces on brain, about 30m ago, but that doesn't seem to be helping either.

 

root@brain ~
  # ip addr show dev bond0.20 ; ip addr show dev bond0.30 ; ip addr show dev bond0.40                                                                                  !10048
9: bond0.20@bond0: <BROADCAST,MULTICAST> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether ba:92:f7:ae:50:d1 brd ff:ff:ff:ff:ff:ff
    inet 10.10.20.1/24 brd 10.10.20.255 scope global bond0.20
       valid_lft forever preferred_lft forever
10: bond0.30@bond0: <BROADCAST,MULTICAST> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether ba:92:f7:ae:50:d1 brd ff:ff:ff:ff:ff:ff
    inet 10.10.30.1/24 brd 10.10.30.255 scope global bond0.30
       valid_lft forever preferred_lft forever
11: bond0.40@bond0: <BROADCAST,MULTICAST> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether ba:92:f7:ae:50:d1 brd ff:ff:ff:ff:ff:ff
    inet 10.10.40.1/24 brd 10.10.40.255 scope global bond0.40
       valid_lft forever preferred_lft forever

 

Note that with the VLAN interfaces down my routing table looks like this:

 

root@brain ~
  # route -n                                                                                                                                                           !10049
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         xx.xx.xx.xx    0.0.0.0         UG    100    0        0 enp15s0
xx.xx.xx.xx    0.0.0.0         255.255.255.0   U     100    0        0 enp15s0
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
192.168.0.0     192.168.1.1     255.255.0.0     UG    425    0        0 br0
192.168.1.0     0.0.0.0         255.255.255.0   U     425    0        0 br0

 

 

  0  
  0  
#8
Options
Re:all clients disconnect and reconnect across network of multiple TP Link devices
2023-03-16 05:43:02

  @BubbaBubba 

 

A few things I'm a little unsure of.

 

1)

In Omada -> Settings -> Wired Networks -> LAN, I have the following settings. Is it correct for the VLAN "Purpose" to be VLAN and not Interface. I'm not doing any routing on the switches, so I think this is correct as the "Interface" settings want subnet/gateway information, which I think is only relevant if I'm using an Omada Gateway/Router. I did try both "Purposes" and neither have resolved my issue.

 

 

2)

In Insights -> Routing Table -> Switch

 

I am not sure why there are routes with next-hop 192.168.0.1, there isn't anything at that IP and I'm not sure what is putting those next-hops into the table. My current thinking is that this isn't a problem because I'm not doing routing on these switches so this routing table is irrelevant, but I don't even understand why there's a routing table at all.

 

Also, the routing table changes every few minutes where some of the routes for some of the switches disappear and then reappear. In the below screenshots you can see the routes for switch-familyroom disappear and reappear. At this point I'm not sure if it's always switch-familyroom or different switch routing entries being removed/added. As stated above, I'm unsure if this is related as I'm not actually doing any routing on the switches, that's all done on brain.

 

 

 

  0  
  0  
#9
Options
Re:all clients disconnect and reconnect across network of multiple TP Link devices
2023-03-16 10:27:53

Hi @BubbaBubba 

 

To better assist you, I've created a support ticket via your registered email address, and escalated it to our support engineer to look into the issue. The ticket ID is TKID230327125, please check your email box and ensure the support email is well received. Thanks!

Best Regards! >> Omada EAP Firmware Trial Available Here << >> Get the Latest Omada SDN Controller Releases Here << *Try filtering posts on each forum by Label of [Early Access]*
  0  
  0  
#10
Options
Re:all clients disconnect and reconnect across network of multiple TP Link devices
2023-03-16 11:49:23

  @Hank21 

 

Thank you. I have engaged the support engineer over email.

  0  
  0  
#11
Options