On Tue, Jul 08, 2014 at 05:35:57PM +0100, Zoltan Kiss wrote: > This patch modifies the LACP selection logic by prefering a slaves with up and > running partners when looking for a lead. > That fixes the following scenario: > - bond has 2 ports, A and B, their other ends are in separate chassis with > MC-LAG sync > - the partner of port A is restarted > - port B is still working > - the partner on port A comes back, but temporarily it is using a default > config, as MC-LAG haven't synced yet > - apparently that default config has a sys_priority which is smaller than the > other, still running port, plus completely different sys_id > - therefore OVS choose port A despite it won't ever comes up into > collecting-distributing state > - and port B is disabled, causing the whole bond goes down > > Checking through the 802.1ax standard, when port A comes up again, the two > links fall apart due to the different LAG IDs. They should be attached to > different Aggregators, and the Aggregators should live separately. In OVS > there > is no such concept as Aggregator, but I think it should be said that it has > only > one Aggregator, and it has an unique policy to choose which ports can join. > Although changing the chassis' default config can also fix this, detecting > such problems quite hard, therefore I think it is still valid to improve > things > in OVS side. > Btw. the Linux kernel bonding drivers' LACP implementation allows more > aggregators, and therefore it could handle this situation properly. > > Signed-off-by: Zoltan Kiss <zoltan.k...@citrix.com>
I verified that the unit tests still pass with this applied. Andy Zhou said he'd review the patch. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev