Omada - custom echo server not used for online detection (but link-up state is)
Hello,
I have a multi-WAN setup with ER7206 running on Omada OC-200.
When WAN1 fails, everything is fine and in a matter of seconds WAN2 is taking over without any interruption (even Teams-calls continue to work).
But when I need to restart the device on WAN1 (which is just a cable box), then this cable box will bring up the LAN link before its own WAN link. Which means WAN1 interface on ER7206 is up, but there is no link towards the internet.
But still WAN1 is set back to active and everything stops working in my network.
So for some reason, the up-state of the WAN1 interface has more priority for online-detection than the echo server?
- Copy Link
- Subscribe
- Bookmark
- Report Inappropriate Content
Hi @RobertMEF
Thanks for posting in our business forum.
I just went for a test and I did not reproduce what you said. It works and can switch back correctly. ER605 V1 1.3.0
It tests the connectivity instead of the link status of the physical port. I did the test and made sure that. So I am pretty certain there is no problem with it. Settings are the same as well.
Did you refresh the page and monitor its status?
- Copy Link
- Report Inappropriate Content
@RobertMEF The upstate is the connection of a physical link, so the status update is fast. However, Online detection requires packet sending confirmation by the WAN interface, resulting in a delay. Therefore, the physical connection of the WAN interface is displayed first. I think you can try to update the latest firmware and Omada version to see if you can solve this problem.
- Copy Link
- Report Inappropriate Content
Hi @RobertMEF
Thanks for posting in our business forum.
I just went for a test and I did not reproduce what you said. It works and can switch back correctly. ER605 V1 1.3.0
It tests the connectivity instead of the link status of the physical port. I did the test and made sure that. So I am pretty certain there is no problem with it. Settings are the same as well.
Did you refresh the page and monitor its status?
- Copy Link
- Report Inappropriate Content
I really was one or two updates behind on Omada-Controller firmware (the loadbalancer is up to date, I mean October 2022 firmware is the latest...). After updating the switch-over is smoother but still not as smooth as expected and some apps in the network just get stuck when they have a handshake in this very moment.
The TTL=118 connection is the backup line, the TTL=57 is the primary WAN.
When the link-up state is detected, the first packets are lost and it seems the LB is switching back to the backup WAN and then trying again to go to the primary WAN.
In this scenario I would expect not a single packet to be lost (only if the switch-over is really in the 31ms RTT of one ICMP and then only this single ICMP should get lost)
- Copy Link
- Report Inappropriate Content
Hi @RobertMEF
Thanks for posting in our business forum.
RobertMEF wrote
I really was one or two updates behind on Omada-Controller firmware (the loadbalancer is up to date, I mean October 2022 firmware is the latest...). After updating the switch-over is smoother but still not as smooth as expected and some apps in the network just get stuck when they have a handshake in this very moment.
The TTL=118 connection is the backup line, the TTL=57 is the primary WAN.
When the link-up state is detected, the first packets are lost and it seems the LB is switching back to the backup WAN and then trying again to go to the primary WAN.
In this scenario I would expect not a single packet to be lost (only if the switch-over is really in the 31ms RTT of one ICMP and then only this single ICMP should get lost)
There was never a zero-loss transition from one WAN to another. When it fails it needs to detect and switch. This also takes resources to do. I don't think this might happen in the future update.
Your description to the failover is correct. Don't have a better way yet.
- Copy Link
- Report Inappropriate Content
Clive_A wrote
There was never a zero-loss transition from one WAN to another. When it fails it needs to detect and switch. This also takes resources to do. I don't think this might happen in the future update.
Your description to the failover is correct. Don't have a better way yet.
the first packet-loss is not neccessary. Thats the link-up of the interface, so there is no answer from the echo-server at this very moment.
And because both WANs are online at the second switch, there is no reason for any packetloss either. The switch is nothing more than a change in the routing-table, zero-packet loss, one packet goes to WAN2 and the next packet goes to WAN1.
The ressources needed are located in a different cpu in the load-balancer and should not affect the routing.
So if there is no specific reason why it only works that way, I think there is a better way which is just not implemented yet.
- Copy Link
- Report Inappropriate Content
Hi @RobertMEF
Thanks for posting in our business forum.
RobertMEF wrote
Clive_A wrote
There was never a zero-loss transition from one WAN to another. When it fails it needs to detect and switch. This also takes resources to do. I don't think this might happen in the future update.
Your description to the failover is correct. Don't have a better way yet.
the first packet-loss is not neccessary. Thats the link-up of the interface, so there is no answer from the echo-server at this very moment.
And because both WANs are online at the second switch, there is no reason for any packetloss either. The switch is nothing more than a change in the routing-table, zero-packet loss, one packet goes to WAN2 and the next packet goes to WAN1.
The ressources needed are located in a different cpu in the load-balancer and should not affect the routing.
So if there is no specific reason why it only works that way, I think there is a better way which is just not implemented yet.
I am aware that the screenshot was for illustration. Thanks for clarifying that.
Yet, sorry to inform you that after confirmation with the dev, we cannot achieve this zero-loss load balancing.
You may look somewhere else and find a suitable product for your purpose. Appreciate it your understanding.
For the original issue you posted earlier, I tested with ER605 V1 1.3.0 and V5.12 Controller and failed to replicate it. The echo server works as expected and it tests the Internet connectivity instead of the physical link.
The physical link would tell the connection by showing you "Link Down" if not connected with a wire.
If you would like to further discuss about the echo server issue, please try again in your topology and let me know your exact steps and details. I'll forward to the test team for a test with your moedel.
- Copy Link
- Report Inappropriate Content
Clive_A wrote
Yet, sorry to inform you that after confirmation with the dev, we cannot achieve this zero-loss load balancing.
You may look somewhere else and find a suitable product for your purpose. Appreciate it your understanding.
Hi Clive,
can you please go more into detail on this matter, how exactly does the switch and the echo-test work then, especially in the case when the primary WAN is coming back online?
- Copy Link
- Report Inappropriate Content
Hi @RobertMEF
Thanks for posting in our business forum.
RobertMEF wrote
Clive_A wrote
Yet, sorry to inform you that after confirmation with the dev, we cannot achieve this zero-loss load balancing.
You may look somewhere else and find a suitable product for your purpose. Appreciate it your understanding.
Hi Clive,
can you please go more into detail on this matter, how exactly does the switch and the echo-test work then, especially in the case when the primary WAN is coming back online?
Two WANs, settings are the same like your pics.
In my case, I brought other routers for the test since you said you have a modem router and it booted up and DHCP is up but the WAN isn't. So I replicated this.
> Make sure the primary is up first, and I pulled the cable on the Primary (uplink router's WAN, the ER605 still gets a LAN from the uplink router IP but no Internet)
> Primary is down (echo server works and tested Internet connectivity)
> Switch over
> Backup is Online
> Test Internet
> Good
> Connect the primary cable back (the uplink router)
> Internet recovers
> Switch back to primary
> Test Internet
> Good
> Backup line status: Offline
Echo server is set to a public one.
- Copy Link
- Report Inappropriate Content
- Copy Link
- Report Inappropriate Content
Hi @RobertMEF
Thanks for posting in our business forum.
RobertMEF wrote
Hi Clive, you said you spoke to the devs. Can you explain what exactly is happening internally in the Omada controller and in the LB to the regarding steps? Would be interesting to know who is sending the echos, at which times and what happens with the routing the the same time and so on.
I asked about what you asked for, zero-loss failover. We cannot support and will not. I did not discuss the routing related thing you said earlier.
Echo server is set to the router, and you also set the interval for online detection. With both together, they work and test the Internet connectivity.
(You can probably lower the value for interval, might be helpful in failover.)
If we don't support that, there is no point in discussing that with the dev so the person I talked with did not go into this part. This, currently, is already acceptable at the business level or networking environment and we don't have many complaints from the contract users, which is also a reason why it is not optimized to be zero-loss.
TBH, we cannot achieve that zero-loss (on all LB-enabled devices we have/had) at this price tag and hardware. Off the record speaking this, consider Cisco, that grade of products, if you seek apex performance.
- Copy Link
- Report Inappropriate Content
Information
Helpful: 0
Views: 1056
Replies: 10
Voters 0
No one has voted for it yet.