From: Jiri Wiesner <jwies...@suse.com>
Date: Sun, 16 Aug 2020 20:52:44 +0200

> When the ARP monitor is used for link detection, ARP replies are
> validated for all slaves (arp_validate=3) and fail_over_mac is set to
> active, two slaves of an active-backup bond may get stuck in a state
> where both of them are active and pass packets that they receive to
> the bond. This state makes IPv6 duplicate address detection fail. The
> state is reached thus:
> 1. The current active slave goes down because the ARP target
>    is not reachable.
> 2. The current ARP slave is chosen and made active.
> 3. A new slave is enslaved. This new slave becomes the current active
>    slave and can reach the ARP target.
> As a result, the current ARP slave stays active after the enslave
> action has finished and the log is littered with "PROBE BAD" messages:
>> bond0: PROBE: c_arp ens10 && cas ens11 BAD
> The workaround is to remove the slave with "going back" status from
> the bond and re-enslave it. This issue was encountered when DPDK PMD
> interfaces were being enslaved to an active-backup bond.
> 
> I would be possible to fix the issue in bond_enslave() or
> bond_change_active_slave() but the ARP monitor was fixed instead to
> keep most of the actions changing the current ARP slave in the ARP
> monitor code. The current ARP slave is set as inactive and backup
> during the commit phase. A new state, BOND_LINK_FAIL, has been
> introduced for slaves in the context of the ARP monitor. This allows
> administrators to see how slaves are rotated for sending ARP requests
> and attempts are made to find a new active slave.
> 
> Fixes: b2220cad583c9 ("bonding: refactor ARP active-backup monitor")
> Signed-off-by: Jiri Wiesner <jwies...@suse.com>

Applied and queued up for -stable, thanks Jiri.

Reply via email to