How Do Redundant Controllers Handle Split-Brain?

How Do Redundant Controllers Handle Split-Brain?

How Do Redundant Controllers Handle Split-Brain?
How Do Redundant Controllers Handle Split-Brain?
2025-12-25 19:11:59 - last edited 2025-12-31 07:12:35
Model: OC300  
Hardware Version: V1
Firmware Version: 6.0.0.36

If I add another OC300 in a different location on the same LAN, a failure between the two locations could cause split-brain, where each controller does not see the other and therefore both controllers assume that they are the primary controller. Once the failure has been resolved, both controllers will see that there is another primary controller on the LAN. Does the controller HA software contemplate and handle this situation? What happens?

0
0
#1
1 Accepted Solution
Re:How Do Redundant Controllers Handle Split-Brain?-Solution
2025-12-31 07:10:48 - last edited 2025-12-31 07:12:35

Hi  @Dr_Marty 

 

A:   Primary Controller

B:  Secondary Controller

 

 

Under regular operation, A manages all SDN devices; B is on standby.

 

When A and B lose communication, but A is still up: A continues to manage all the SDN devices; B is on standby.

When A and B lose communication, A is down:  B takes over all SDN devices, turns into the Primary node;

A and B reconnect, A restored:   B, the new primary node, continues to manage all the SDN devices;  A turns into the secondary node, standby.

 

A and B will not work as the primary node simultaneously.

Recommended Solution
0
0
#4
7 Reply
Re:How Do Redundant Controllers Handle Split-Brain?
2025-12-30 07:03:00 - last edited 2025-12-30 07:04:35

Hi  @Dr_Marty 

 

Thanks for posting here.Do you mean this? How to Configure Hot-Standby Backup Mode on Omada Controller

 

Generally, the Primary Node is responsible for network management and process running. The Secondary Node synchronizes data with the Primary Node. When the Primary node goes down, the Secondary Node will take over network and clients management. During the failover, the devices will go offline for a short time, then they will reconnect to the new Primary Node When the devices get connected again, all services will run normally. If the previous Primary Node recovers from failover, it will continue to run as a Secondary Node.

 

the two locations could cause split-brain

>>> Does this mean, the two controllers loose connection with each other? It doesn't matter, the primary node will continue to manage the devices. 

 

 

Let me know if I have any misunderstanding.

0
0
#2
Re:How Do Redundant Controllers Handle Split-Brain?
2025-12-30 18:35:09

  @Vincent-TP Thank you for getting back to me.

 

Suppoose that I have two racks connected to each other as a single LAN: Rack A and Rack B. I have an OC300 in each rack configured in HA with the OC300 in Rack A operating as primary and the OC300 in Rack B operating as secondary. Suppose then that the connection between the racks goes down. Now devices in each rack cannot see the devices in the other rack.

 

When the Primary node goes down, the Secondary Node will take over network and clients management. During the failover, the devices will go offline for a short time, then they will reconnect to the new Primary Node

 

Rack A controller will see Rack B devices disappear, but will continue as the primary controller. Rack B controller will see Rack A devices disappear, including the primary controller. Rack B controller will perform as described above and assume the primary role, adopting Rack B devices. At this point Rack A controller has deemed itself as primary and Rack B controller has deemed itself as primary. Neither controller can see the other rack devices.

 

Now suppose that the connection between the racks is restored. At this point, all devices in each rack can see all devices in both racks. Rack A controller is primary and controlling Rack A devices but also sees that Rack B controller is operating as primary and controlling Rack B devices. Rack B controller has a similar view.

 

Each rack now has a primary controller controlling the dervices in that rack. What happens?

0
0
#3
Re:How Do Redundant Controllers Handle Split-Brain?-Solution
2025-12-31 07:10:48 - last edited 2025-12-31 07:12:35

Hi  @Dr_Marty 

 

A:   Primary Controller

B:  Secondary Controller

 

 

Under regular operation, A manages all SDN devices; B is on standby.

 

When A and B lose communication, but A is still up: A continues to manage all the SDN devices; B is on standby.

When A and B lose communication, A is down:  B takes over all SDN devices, turns into the Primary node;

A and B reconnect, A restored:   B, the new primary node, continues to manage all the SDN devices;  A turns into the secondary node, standby.

 

A and B will not work as the primary node simultaneously.

Recommended Solution
0
0
#4
Re:How Do Redundant Controllers Handle Split-Brain?
2026-01-02 03:21:56

  @Vincent-TP I don't want to beat this to death but I do want to understand.

 

When A and B lose communication, but A is still up: A continues to manage all the SDN devices; B is on standby.

 

If A and B cannot communicate, how does B know that A is up? Wouldn't B's perspective be that A went offline?

0
0
#5
Re:How Do Redundant Controllers Handle Split-Brain?
2026-01-04 03:49:49

Hi @Dr_Marty 

 

Backup Node takeover of SDN devices requires two conditions:

1. Detection that the primary node is offline;

2. Receipt of discovery packets sent by the SDN device.

 

The SDN devices will send discovery packets only when they are not managed by any controller.

 

When A and B lose communication, but A is still up: A continues to manage all the SDN devices; B is on standby.

In this scenario, the SDN device is still under normal management and thus will not send discovery packets within the local network. Even if the backup node detects that the primary is offline, it cannot take over these SDN devices because it does not receive the SDN device's discovery packet.

 

Any further confusion, feel free to let me know. I am glad to explain more.

0
0
#6
Re:How Do Redundant Controllers Handle Split-Brain?
2026-01-10 23:42:20
Thank you for clarifying.
0
0
#7
Re:How Do Redundant Controllers Handle Split-Brain?
7 hours ago

  @Vincent-TP 

Sorry to resurrect, but I have the same question as OP, and still not sure how it works. Let me be even more specific in the example that OP already mentioned. Lets say the network structure is like this:

ControllerA -- SwitchA === SwitchB --- ControllerB

When everything is working fine, ControllerA is a primary, ControllerB is on standby. Both SwitchA and SwitchB are managed by ControllerA. If ControllerA fails by itself, then ControllerB assumes primary role, readopts both SwitchA and SwitchB, and keeps on trucking. When ControllerA comes back online, it becomes secondary (I think?).

So far so good, it is clear.

But imagine now that the cable between SwitchA and SwitchB (denoted by ===) is broken, but everything else is operational.

So Controller A is seeing that ControllerB AND SwitchB dropped off. It sill show SwitchB as offline, still thinks it is primary, and continue managing SwitchA which is reachable by it.
Now, ControllerB is actually still working, and what it sees that ControllerA dropped off. It can't see SwitchA either, but it is not managing it (yet). It does see SwitchB though. Now, as ControllerA is not seen, I believe ControllerB would assume it is primary now, and need to start managing devices. It sees that SwitchA is offline, so it can't manage it. You said that in order to readopt SwitchB it needs to receive discovery request from SDN device. But SwitchB was used to be managed by ControllerA, and it can't see this controller anymore. So I assume SwitchB will start sending discovery packets, at which point ControllerB should adopt it (no different as if the whole A section went up in flames - except it didn't. but B section doesn't know that).

So now as far as I can see, ControllerA thinks it is primary and is managing SwitchA, and ControllerB thinks it is primary and is managing SwitchB. That's the split-brain scenario.

Now, because both are primary controllers that see no backup, we can make some config changeA on SwitchA, and some config changeB on SwitchB through controllers A and B, respectively - and ControllerA doesn't know about changeB, and ControllerB doesn't know about changeA.

So now connection between A and B (===) is restored. There are two controllers that both think they are primary, both manage 1 device eachi, with the devices offline to hem coming online, and both seeing another controller. There are also changeA and changeB changes implemented, and controllers do not know that there are two different changes.

So, what happens here?
- Which controller is primary now? Is it switchA, not-deterministic (however comes first), something else? Both controllers are primary and continue to manage their respective devices (as these presumably won't be sending discovery as they are being managed, and nothing happened to theiir controller)?
- Do both switched get re-connecred to the new primary one, say it is ControllerA? What happens to changeB then? Controller A never saw it. Will it be somehow incorporated into ControllerA state, discarded/re-provisioned with the state before the split (hence losing changeB), something else?

Please clarify - thanks.

0
0
#8