Jarod Wilson <ja...@redhat.com> wrote: >On 5/10/19 6:53 PM, Jay Vosburgh wrote: >> Jarod Wilson <ja...@redhat.com> wrote: >> >>> There's currently a problem with toggling arp_validate on and off with an >>> active-backup bond. At the moment, you can start up a bond, like so: >>> >>> modprobe bonding mode=1 arp_interval=100 arp_validate=0 >>> arp_ip_targets=192.168.1.1 >>> ip link set bond0 down >>> echo "ens4f0" > /sys/class/net/bond0/bonding/slaves >>> echo "ens4f1" > /sys/class/net/bond0/bonding/slaves >>> ip link set bond0 up >>> ip addr add 192.168.1.2/24 dev bond0 >>> >>> Pings to 192.168.1.1 work just fine. Now turn on arp_validate: >>> >>> echo 1 > /sys/class/net/bond0/bonding/arp_validate >>> >>> Pings to 192.168.1.1 continue to work just fine. Now when you go to turn >>> arp_validate off again, the link falls flat on it's face: >>> >>> echo 0 > /sys/class/net/bond0/bonding/arp_validate >>> dmesg >>> ... >>> [133191.911987] bond0: Setting arp_validate to none (0) >>> [133194.257793] bond0: bond_should_notify_peers: slave ens4f0 >>> [133194.258031] bond0: link status definitely down for interface ens4f0, >>> disabling it >>> [133194.259000] bond0: making interface ens4f1 the new active one >>> [133197.330130] bond0: link status definitely down for interface ens4f1, >>> disabling it >>> [133197.331191] bond0: now running without any active interface! >>> >>> The problem lies in bond_options.c, where passing in arp_validate=0 >>> results in bond->recv_probe getting set to NULL. This flies directly in >>> the face of commit 3fe68df97c7f, which says we need to set recv_probe = >>> bond_arp_recv, even if we're not using arp_validate. Said commit fixed >>> this in bond_option_arp_interval_set, but missed that we can get to that >>> same state in bond_option_arp_validate_set as well. >>> >>> One solution would be to universally set recv_probe = bond_arp_recv here >>> as well, but I don't think bond_option_arp_validate_set has any business >>> touching recv_probe at all, and that should be left to the arp_interval >>> code, so we can just make things much tidier here. >>> >>> Fixes: 3fe68df97c7f ("bonding: always set recv_probe to bond_arp_rcv in arp >>> monitor") >> >> Is the above Fixes: tag correct? 3fe68df97c7f is not the source >> of the erroneous logic being removed, which was introduced by >> >> commit 29c4948293bfc426e52a921f4259eb3676961e81 >> Author: sfel...@cumulusnetworks.com <sfel...@cumulusnetworks.com> >> Date: Thu Dec 12 14:10:38 2013 -0800 >> >> bonding: add arp_validate netlink support > >I wasn't entirely sure that was the best choice for Fixes either, it was >sort of more "Augments the Fix in", so I'd certainly have no objection to >changing the Fixes tag to the earlier commit instead.
That would be my preference, as the 29c4948293bf commit looks to be the change actually being fixed. -J --- -Jay Vosburgh, jay.vosbu...@canonical.com