Knowledge Base Active/Passive LACP between TL-SG3428 and Synology NAS (DS1815+) not working while Static LAG does
Hello folks,
today I tried to setup Link Aggregation Control Protocol (LACP) between my TL-SG3428 switch (running FW 2.0.11) and an old synology ds1815+ nas (also running old DSM 6.2.3) via Omada software controller 5.12.7.
I therefore followed these two guides: https://community.tp-link.com/en/business/forum/topic/609710 (primarily) and https://www.tp-link.com/de/support/faq/3689/ (for general reference)
My settings are equivalent (i.e. equal apart from concrete ips, subnet and ports) to the first link above except the LAG ID being 2 as I have also another LAG definded as an uplink to another omada switch (which is perfectly working).
Configurations look like this:
The problem I am facing is that even both devices (switch + nas) are showing the lacp connection as being successfully established, nothing goes over this LAG.. No ping, no dhcp etc.
However if I completely remove the LAG again and setup both devices' configurations as "Static LAG" everything is working, I can access the nas etc.
I know this could be an error on either side but as the connection is showing as active on both devides it seems like a potential problem with the "profile" as e.g. disucced in a (non-omada) setting here: www . synology-forum . de/threads/link-aggregation-ds214-tplink-switch-tl-sg3210.46795/post-375396 [a german thread!]
Since the configuration per se is not particularly complicated and is instead relatively straight forward, I'm afraid I don't really know what to do to get the aggegation with "active lacp" working.
I would be very happy about any help!
What I should also add is that I am not a complete noob when it comes to networking. So I of course ensured that the firewall is off, ips are in the correct subnet etc. I am also not using any vlans or acls in this small network that would block anything. As said as soon as I setup the bond as "static lag" instead of "active lacp" or "passive lacp" on both the nas and switch with all other settings being equal, everything works like a charm.
- Copy Link
- Subscribe
- Bookmark
- Report Inappropriate Content
Thank you very much for your compliment! You and @Clive_A really were a great help for me, I didn't expect such a great support when I (in my quite baffled and a bit crestfallen mood after various tries of different configs) wrote this post here. So a really big thanks to you guys!!
Even though I am happy that it work's now, I'm quite ashamed and even sad of what I have to admit was the solution.. believe me or not.. a simple reboot of the switch fixed EVERYTHING.
Now really everything what @KJK says applies (and which is the same as I know LCAP): You can't be so stupid that it no longer works.. Completely irrelevant of what I configure (active, passive, 2 links, 3 links, auto speed, manual speed) everything just works...
Just to let you and other people with similar problems know, I of course had - before that - already checked what @KJK proposed:
2 links only:
- Did not change the situation, the links were still marked as "suspended".
Logs:
- I have to admit that I have not used the cli much before this problem here so I of course did not look at the switch log at the beginning of this discussion.
- However I did so before my last post but as there was nothing in it (at least with this command: "show logging buffer level 6") apart from entries like "Gi1/0/X changed state to up" as soon as I plugged the cables in, I did not mention it either.
The other switch (a TL-SG2008P):
- Spoier altert: This one put me on the right track
- As said in my last post, I had this switch as - more or less - a backup on my network without doing much.
- So I set up LACP (2 links, auto speed, default hash algorithm - so just as I had configured it in the TL-SG3428) there and plugged in the nas with the same cables as before... and bam: Worked like a charm immediately..
- This led me to the suggestion that there might be a bug or something with that particular switch/firmware/the ports/idk. So I took some simple RJ45 couplings and hooked the nas to another TL-SG3428 (the one that is on the other end of the LAG1 connection you already saw several times above)... and well.. also there the most simple and default LACP config did it and everything just worked...
- Next up, I connected the backup TL-SG2008P to the problematic TL-SG3428 using the already configured LAGs. Result: Same as with the synology: The problmatic TL-SG3428 immediately said "state = s - suspended" for the links.
- I was about to reset the problematic TL-SG3428 but thought hey.. maybe the good old microsoft windows rule works... let's give it a reboot.. and it rebooted and it worked... Without even changing anything the nas was immediatley pingable, both ports showed as "P - bundled in port-channel" in the cli.. The only thing that did stay is the nas is not displaying the correct speed when using the "synonet --show" command (with the result staying "-1, unknown duplex, active mtu 1500")
So yeah, needless to say, I owe you guys! Thanks so much again and may this help anyone having a similar issue.
- Copy Link
- Report Inappropriate Content
Hi @ThomFri
Thanks for posting in our business forum.
What is your config of the Synology?
If one is Active, another is supposed to be passive. Did you set the Synology as passive?
There isn't too much I or you can do for the LACP. It should work. Group ID does not matter. It should be intuitive. I guess you should not connect the cables before finish configuring the LACP.
To dig in, you gotta use Wireshark to check that. But I am not sure if you can. LACP sends packets to maintain the connection. I'd like to see the Wireshark result if necessary.
How to capture packets using Wireshark on SMB router or switch
How to Use Port Mirror to Capture Packets in the Controller
Your synology forum does not make any sense at all. It was a 2013 post. 3210 is not the same as a decade ago. I don't get why it can be referred to here. The system is basically the same as other models. You have already referred to two articles and both showed that LACP can work on either Omada switches or Synology LACP. I am more inclined to suspect the Synology or your config.
- Copy Link
- Report Inappropriate Content
There are static LAGs and dynamic LAGs. The dynamic ones are simply LAGs with LACP and the static ones are LAGs without LACP. LACP is added to a LAG to ensure its proper configuration, monitor its status and react to that status fast so there is no or minimal packet loss. If your static LAG works and the dynamic LAG does not, it is most likely because there is some hidden issue with that LAG and LACP has discovered it. Therefor you need to check the status of that LAG. Unfortunately Synology NAS does not provide any details for it. To check the status of a LAG on the NAS side, you need to SSH into the NAS and use LINUX commands. I do not know what kind, if any, OMADA Controller offers for it, but I wouldn’t trust it anyway. Again, SSH into the switch and use CLI commands. Unfortunately, I cannot provide any sample LINUX or CLI commands right now since I’m on a trip right now with no access to my lab. Good luck!
- Copy Link
- Report Inappropriate Content
Thank you very much for your quick replies and sorry for my late reply (I was also on the road for some days and wanted to rule some parts out etc. before replying)
As @KJK already posted, there isn't much to configure on the synology side apart from the screenshot I already attatched. In there I have tried to use "IEEE 802.3ad" which fails and "Balance XOR" (which is static LA) which does (or at least seemed to) work. I therefore left the static configuration for some days.. However, I noted that the virus scanner "ESET Antivirus", which is installed on one of my windows client machines, regularly posted warnings that the bond ip (192.168.221.35) sends malicious packages. Which now lets me suppose what @KJK already suggested: There might also/already be something wrong with the LAG configuration itself.
To rule some more stuff out I upgraded the synology nas to a more recent firmware version (DSM 7.1.1-42962 Update 6) which - as suspected - did not fix the situation. :'(
I therefore tried what you two reccommended with the following results:
Port Mirroring & Wireshark
As I have done port mirroring for debugging quite often already I had thought about doing this but as I expected that the LACP packages might not be transferred over the mirror I did not try it in advance - sorry for that.
However as I set it up and hooked everything up I could see nothing on the line (except from the packages coming from the client itself on which wireshark is running). This is also somewhat strange as the LED lights corresponding to the bond are flashing very frequently while the mirroring port is only flashing very rarely..
=> So as I said: I was not able to see anything here... :/
Checking Status and Configuration via SSH
Before diving into the output, I'll first give you some background info about the ports/ips:
- TL-SG3428:
- IP: 192.168.221.162
- Port 1-4:
- Passive LACP
- Connected to other switch in my network (which has active LACP on its side - everything is wokring like a charm here)
- Port 8:
- Without any special configuration
- Connected to synology nas management port eth3 (see below)
- Port 9-11:
- Active LACP
- Connected to synology nas bond0 (see below)
- Synology NAS:
- Port eth0-eth2 = bond0:
- IP: 192.168.221.35
- LACP enabled (as said above; without knowing any further details :/ )
- Port eth3:
- IP: 192.168.221.21
- Management port (to access web management interface, ssh, ...)
- Port eth0-eth2 = bond0:
As @KJK suggested I was looking at both sides' configurations and status using cli. I therefore checked the TL-SG3428 CLI Manual for anything that had to do with LAG/LACP (which was not much to admit) but at least provided some more details:
TL-SG3428#show lacp internal Flags: S - Device is requesting Slow LACPDUs F - Device is requesting Fast LACPDUs A - Device is in active mode P - Device is in passive mode Channel group 1 LACP port Admin Oper Port Port Port Flags State Priority Key Key Number State Gi1/0/1 SP Up 32768 0x1 0xf93 0x1 0x3c Gi1/0/2 SP Up 32768 0x1 0xf93 0x2 0x3c Gi1/0/3 SP Up 32768 0x1 0xf93 0x3 0x3c Gi1/0/4 SP Up 32768 0x1 0xf93 0x4 0x3c Channel group 2 LACP port Admin Oper Port Port Port Flags State Priority Key Key Number State Gi1/0/9 SA Up 32768 0x2 0x1d3 0x9 0x5 Gi1/0/10 SA Up 32768 0x2 0x1d3 0xa 0x5 Gi1/0/11 SA Up 32768 0x2 0x1d3 0xb 0x5
And on the nas I checked the synology cli manual as well as some other linux commands:
root@DS1815plus:~# synonet --show
System network interface list:
Host Name: DS1815plus
Network interface: bond0
Manual IP
IP: 192.168.221.35
Mask: 255.255.255.0
Gateway: 192.168.221.250
DNS: 192.168.221.250
MTU Setting: 1500
-1, unknown duplex, active mtu 1500
RX bytes: 718241
TX bytes: 27458
Host Name: DS1815plus
Network interface: eth3
Manual IP
IP: 192.168.221.21
Mask: 255.255.255.0
Gateway: 192.168.221.250
DNS: 192.168.221.250
MTU Setting: 1500
1000, full duplex, active mtu 1500
RX bytes: 414357843244
TX bytes: 1354786323123
root@DS1815plus:~# synonet --check 192.168.221.35
OK! Valid network interface parameters.
root@DS1815plus:~# synonet --la_get bond0
[bond0]
Mode: 802.3ad
Error Setted: false
Slaves: eth0 eth1 eth2
root@DS1815plus:~# synonet --la_exist
bonding is exist
root@DS1815plus:~# ifconfig
bond0 Link encap:Ethernet HWaddr XX:XX:XX:XX:XX:XX
inet addr:192.168.221.35 Bcast:192.168.221.255 Mask:255.255.255.0
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:7391 errors:0 dropped:0 overruns:0 frame:0
TX packets:304 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:941069 (919.0 KiB) TX bytes:34898 (34.0 KiB)
[...]
eth0 Link encap:Ethernet HWaddr XX:XX:XX:XX:XX:XX
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:2401 errors:0 dropped:0 overruns:0 frame:0
TX packets:106 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:299434 (292.4 KiB) TX bytes:11870 (11.5 KiB)
eth1 Link encap:Ethernet HWaddr XX:XX:XX:XX:XX:XX
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:2605 errors:0 dropped:0 overruns:0 frame:0
TX packets:116 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:343659 (335.6 KiB) TX bytes:12860 (12.5 KiB)
eth2 Link encap:Ethernet HWaddr XX:XX:XX:XX:XX:XX
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:2385 errors:0 dropped:0 overruns:0 frame:0
TX packets:82 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:297976 (290.9 KiB) TX bytes:10168 (9.9 KiB)
eth3 Link encap:Ethernet HWaddr XX:XX:XX:XX:XX:XX
inet addr:192.168.221.21 Bcast:192.168.221.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:678574299 errors:1 dropped:374 overruns:374 frame:1
TX packets:918066788 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:415215767429 (386.6 GiB) TX bytes:1355069732058 (1.2 TiB)
So the only thing I noted was that the "port state" and "oper mode" was different - but which seems to have something to do with the link speed and ative/passive only. And also that the nas is not sure about the connection speed of the bond (also manually setting it to 1000 Full Duplex did not change the cli output).
Having said that it is obviously still not working even though both devices show the link as active+working:
I would be happy about any further hints!
Thank you very much guys!
- Copy Link
- Report Inappropriate Content
The most basic LAG parameters checked by LACP are:
- Same port speed
- Full duplex
You want to have 1G/s and full duplex on each of your three links.
I see you have highlighted some issues. On the switch, Port State 0x5 (hex) is 0x00000101 (bin) so the value of Bit 2 (Aggregation) is 0. This means that LACP on the switch will not allow the link on the corresponding port to be aggregated. On NAS, “-1, unknown duplex” does not look good, either. You want to see “3000, full duplex” instead (3000 is for 3 x 1000M). It looks like there is something wrong with the links between the switch and the NAS. Changing of some port/link configuration will not help.
If I were you, I would remove all ports from the bond/LAG, set their speed to Auto and then check the status of each link one by one. If the status of a link is not 1G/s and full duplex, there is a issue with the link. That would be most likely a wiring issue.
- Copy Link
- Report Inappropriate Content
Hi @ThomFri
Thanks for posting in our business forum.
Use etherchannel and view the identifier value. Then hexadecimal to binary. If you see 111, that means a linkup. If no, 000 means it is not linking up. That's a failure in the negotiation.
How did you mirror it? You should see the LACP sending from the switch ports. Our switch is based on layer 2 and layer 3, using LACP as the filter you should see the LACPDU from the switch every 30 seconds if it is active LACP.
And for a bad link between the LAG, that might be a problem with the hash algorithm. What's the hash algorithm?
Let's make sure the LACP is sending the sync packet first.
- Copy Link
- Report Inappropriate Content
Thanks again for your contributions and suggestions! What I have tried and found out:
Link Speeds
- As soon as I deleted the lacp config (on both sides) all links immediatley show speed 1000/Full Duplux as they should.
- (To rule some more failures out I also replaced the ethernet cables at this point)
- Set speed 1000/Full Duplux on all respective ports manually
- Recreated the the LAG with "Passive LACP" (see below) and also set the speed for the LAG to manual -> 1000/Full Duplex
=> No changes; same cli outputs; issue remains
Etherchannel CLI
- When first running show etherchannel detail I noticed that the synology nas (obviously) is the active partner here, as the Flags column was showing SA, which is why I re-setup the LAG as "Passive LACP" on the switch [after in fact googling that, I also found posts in cisco forums where the issue was solely resolved by setting lacp in passive mode - which was however not the solution in my case and as I also said above, the post by @NinjaMonkey (https://community.tp-link.com/en/business/forum/topic/609710) also uses active lacp and everything seems to work in that case as well]
- After changing to passive lcap the outputs look like this (the nas-switch-lag is in yellow):
TL-SG3428#show etherchannel detail Group: 1 ---------- Group state = L2 Ports: 4 MaxPorts = 16 Protocol: LACP Ports in the group: ------------------- Flags: S - Device is sending Slow LACPDUs F - Device is sending fast LACPDUs. A - Device is in active mode. P - Device is in passive mode. Local information: LACP port Admin Oper Port Port Port Flags State Priority Key Key Number State Gi1/0/1 SP Up 32768 0x1 0xf93 0x1 0x3c Gi1/0/2 SP Up 32768 0x1 0xf93 0x2 0x3c Gi1/0/3 SP Up 32768 0x1 0xf93 0x3 0x3c Gi1/0/4 SP Up 32768 0x1 0xf93 0x4 0x3c Partner's information: LACP port Oper Port Port Port Flags Priority Dev ID Key Number State Gi1/0/1 SA 32768 2887.baf3.b4c9 0xc40 0x15 0x3d Gi1/0/2 SA 32768 2887.baf3.b4c9 0xc40 0x16 0x3d Gi1/0/3 SA 32768 2887.baf3.b4c9 0xc40 0x17 0x3d Gi1/0/4 SA 32768 2887.baf3.b4c9 0xc40 0x18 0x3d Group: 2 ---------- Group state = L2 Ports: 3 MaxPorts = 16 Protocol: LACP Ports in the group: ------------------- Flags: S - Device is sending Slow LACPDUs F - Device is sending fast LACPDUs. A - Device is in active mode. P - Device is in passive mode. Local information: LACP port Admin Oper Port Port Port Flags State Priority Key Key Number State Gi1/0/9 SP Up 32768 0x2 0x615 0x9 0x4 Gi1/0/10 SP Up 32768 0x2 0x615 0xa 0x4 Gi1/0/11 SP Up 32768 0x2 0x615 0xb 0x4 Partner's information: LACP port Oper Port Port Port Flags Priority Dev ID Key Number State Gi1/0/9 FA 255 0011.3240.d557 0x11 0x1 0xf Gi1/0/10 FA 255 0011.3240.d557 0x11 0x2 0xf Gi1/0/11 FA 255 0011.3240.d557 0x11 0x3 0xf TL-SG3428#show etherchannel summary Flags: D - down P - bundled in port-channel U - in use I - stand-alone H - hot-standby(LACP only) s - suspended R - layer3 S - layer2 f - failed to allocate aggregator u - unsuitable for bundling w - waiting to be aggregated d - default port Group Port-channel Protocol Ports ------+-------------+---------+------------------------------------------------- 1 Po1(S) LACP Gi1/0/1(P) Gi1/0/2(P) Gi1/0/3(P) Gi1/0/4(P) 2 Po2(S) LACP Gi1/0/9(s) Gi1/0/10(s) Gi1/0/11(s)
- This is (at least if I hopefully got that right) what @KJK meant: The link stays at Port State 0x5 (or 0x4 now with passive) and the links are flagged as "suspended".
- I therefore hit up google and looked for stuff like "lacp suspended synology" etc. which also did not really lead to primising results.. most of them are cisco forum posts where either only one link was in "suspended" mode or where setting lacp to passive did the trick.
=> Also no success here
Wireshark & Searching the LACP Packages
- My mirroring config looks like this:
- As I had lots of noice without a filter last time and using a filter based in source/destination mac/ip lead to an empty list, I googled and found this manual here: https://wiki.wireshark.org/LinkAggregationControlProtocol.md - which I then followed
- I also tried unplugging and connecting all/some links but, however with the same result: An empty wireshark capture..
Hashing Algorithm...
... is set to default, i.e. "SRC MAC + DST MAC"
Thanks a lot again guys and I hope I didn't miss anything! I will also try a different switch that I have as a backup in my environment later today/tomorrow to rule out any misconfig, bug etc. on that part as well
- Copy Link
- Report Inappropriate Content
Let me first compliment you on the work you are doing to resolve this issue. You try suggestions and come back with results as well as you do your own research. That’s how it should be done!
I’m also glad that the wiring problem has been ruled out. Now, I can suggest only two more things. The first one is to check the logs for any anomalies. That’s actually what I should’ve suggested in the first place. The second one is to remove one link from the LAG. I didn’t think about it initially, but I believe there should be an even number of links in a LAG. I vaguely remember somebody asking a question about it on some other forum and, after all, the hash algorithms used in LAGs are binary.
I do not recommend using the passive mode. I don’t see any need for it. Many switches, quite feature-rich, do not even give you that option anymore. If I want a particular switch to be in control of a LAG, I use the priority setting. I also do not recommend using any speed option other than Auto. If ports cannot negotiate the proper speed, there must be some issue with the link and it should be resolved before the link is put into a LAG. That’s based on my own experience. I have several LACP LAGs in my network. Switch to switch, switch to Synology NAS, copper in both links, copper in one link and fiber in the other, 10G, 2.5G, 1G. Even a 1G Synology NAS connected to a 10G switch. They all work fine without any need to play with those settings.
- Copy Link
- Report Inappropriate Content
Thank you very much for your compliment! You and @Clive_A really were a great help for me, I didn't expect such a great support when I (in my quite baffled and a bit crestfallen mood after various tries of different configs) wrote this post here. So a really big thanks to you guys!!
Even though I am happy that it work's now, I'm quite ashamed and even sad of what I have to admit was the solution.. believe me or not.. a simple reboot of the switch fixed EVERYTHING.
Now really everything what @KJK says applies (and which is the same as I know LCAP): You can't be so stupid that it no longer works.. Completely irrelevant of what I configure (active, passive, 2 links, 3 links, auto speed, manual speed) everything just works...
Just to let you and other people with similar problems know, I of course had - before that - already checked what @KJK proposed:
2 links only:
- Did not change the situation, the links were still marked as "suspended".
Logs:
- I have to admit that I have not used the cli much before this problem here so I of course did not look at the switch log at the beginning of this discussion.
- However I did so before my last post but as there was nothing in it (at least with this command: "show logging buffer level 6") apart from entries like "Gi1/0/X changed state to up" as soon as I plugged the cables in, I did not mention it either.
The other switch (a TL-SG2008P):
- Spoier altert: This one put me on the right track
- As said in my last post, I had this switch as - more or less - a backup on my network without doing much.
- So I set up LACP (2 links, auto speed, default hash algorithm - so just as I had configured it in the TL-SG3428) there and plugged in the nas with the same cables as before... and bam: Worked like a charm immediately..
- This led me to the suggestion that there might be a bug or something with that particular switch/firmware/the ports/idk. So I took some simple RJ45 couplings and hooked the nas to another TL-SG3428 (the one that is on the other end of the LAG1 connection you already saw several times above)... and well.. also there the most simple and default LACP config did it and everything just worked...
- Next up, I connected the backup TL-SG2008P to the problematic TL-SG3428 using the already configured LAGs. Result: Same as with the synology: The problmatic TL-SG3428 immediately said "state = s - suspended" for the links.
- I was about to reset the problematic TL-SG3428 but thought hey.. maybe the good old microsoft windows rule works... let's give it a reboot.. and it rebooted and it worked... Without even changing anything the nas was immediatley pingable, both ports showed as "P - bundled in port-channel" in the cli.. The only thing that did stay is the nas is not displaying the correct speed when using the "synonet --show" command (with the result staying "-1, unknown duplex, active mtu 1500")
So yeah, needless to say, I owe you guys! Thanks so much again and may this help anyone having a similar issue.
- Copy Link
- Report Inappropriate Content
Information
Helpful: 0
Views: 2227
Replies: 8
Voters 0
No one has voted for it yet.