On Tue, Jul 08, 2014 at 05:35:57PM +0100, Zoltan Kiss wrote:
> This patch modifies the LACP selection logic by prefering a slaves with up and
> running partners when looking for a lead.
> That fixes the following scenario:
> - bond has 2 ports, A and B, their other ends are in separate chassis with
>   MC-LAG sync
> - the partner of port A is restarted
> - port B is still working
> - the partner on port A comes back, but temporarily it is using a default
>   config, as MC-LAG haven't synced yet
> - apparently that default config has a sys_priority which is smaller than the
>   other, still running port, plus completely different sys_id
> - therefore OVS choose port A despite it won't ever comes up into
>   collecting-distributing state
> - and port B is disabled, causing the whole bond goes down
> 
> Checking through the 802.1ax standard, when port A comes up again, the two
> links fall apart due to the different LAG IDs. They should be attached to
> different Aggregators, and the Aggregators should live separately. In OVS 
> there
> is no such concept as Aggregator, but I think it should be said that it has 
> only
> one Aggregator, and it has an unique policy to choose which ports can join.
> Although changing the chassis' default config can also fix this, detecting
> such problems quite hard, therefore I think it is still valid to improve 
> things
> in OVS side.
> Btw. the Linux kernel bonding drivers' LACP implementation allows more
> aggregators, and therefore it could handle this situation properly.
> 
> Signed-off-by: Zoltan Kiss <zoltan.k...@citrix.com>

I verified that the unit tests still pass with this applied.

Andy Zhou said he'd review the patch.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to