From: Jiri Wiesner <jwies...@suse.com> Date: Sun, 16 Aug 2020 20:52:44 +0200
> When the ARP monitor is used for link detection, ARP replies are > validated for all slaves (arp_validate=3) and fail_over_mac is set to > active, two slaves of an active-backup bond may get stuck in a state > where both of them are active and pass packets that they receive to > the bond. This state makes IPv6 duplicate address detection fail. The > state is reached thus: > 1. The current active slave goes down because the ARP target > is not reachable. > 2. The current ARP slave is chosen and made active. > 3. A new slave is enslaved. This new slave becomes the current active > slave and can reach the ARP target. > As a result, the current ARP slave stays active after the enslave > action has finished and the log is littered with "PROBE BAD" messages: >> bond0: PROBE: c_arp ens10 && cas ens11 BAD > The workaround is to remove the slave with "going back" status from > the bond and re-enslave it. This issue was encountered when DPDK PMD > interfaces were being enslaved to an active-backup bond. > > I would be possible to fix the issue in bond_enslave() or > bond_change_active_slave() but the ARP monitor was fixed instead to > keep most of the actions changing the current ARP slave in the ARP > monitor code. The current ARP slave is set as inactive and backup > during the commit phase. A new state, BOND_LINK_FAIL, has been > introduced for slaves in the context of the ARP monitor. This allows > administrators to see how slaves are rotated for sending ARP requests > and attempts are made to find a new active slave. > > Fixes: b2220cad583c9 ("bonding: refactor ARP active-backup monitor") > Signed-off-by: Jiri Wiesner <jwies...@suse.com> Applied and queued up for -stable, thanks Jiri.