This patch modifies the LACP selection logic by prefering a slaves with up and running partners when looking for a lead. That fixes the following scenario: - bond has 2 ports, A and B, their other ends are in separate chassis with MC-LAG sync - the partner of port A is restarted - port B is still working - the partner on port A comes back, but temporarily it is using a default config, as MC-LAG haven't synced yet - apparently that default config has a sys_priority which is smaller than the other, still running port, plus completely different sys_id - therefore OVS choose port A despite it won't ever comes up into collecting-distributing state - and port B is disabled, causing the whole bond goes down
Checking through the 802.1ax standard, when port A comes up again, the two links fall apart due to the different LAG IDs. They should be attached to different Aggregators, and the Aggregators should live separately. In OVS there is no such concept as Aggregator, but I think it should be said that it has only one Aggregator, and it has an unique policy to choose which ports can join. Although changing the chassis' default config can also fix this, detecting such problems quite hard, therefore I think it is still valid to improve things in OVS side. Btw. the Linux kernel bonding drivers' LACP implementation allows more aggregators, and therefore it could handle this situation properly. Signed-off-by: Zoltan Kiss <zoltan.k...@citrix.com> --- v2: - fixing bitmask in lacp_partner_ready - query the right slave's partner state diff --git a/lib/lacp.c b/lib/lacp.c index 0d30e51..94a304c 100644 --- a/lib/lacp.c +++ b/lib/lacp.c @@ -579,6 +579,12 @@ lacp_wait(struct lacp *lacp) OVS_EXCLUDED(mutex) lacp_unlock(); } +static bool +lacp_partner_ready(struct slave *slave) +{ + return slave->partner.state & (LACP_STATE_COL | LACP_STATE_DIST); +} + /* Static Helpers. */ /* Updates the attached status of all slaves controlled by 'lacp' and sets its @@ -616,7 +622,8 @@ lacp_update_attached(struct lacp *lacp) OVS_REQUIRES(mutex) slave->attached = true; slave_get_priority(slave, &pri); - if (!lead || memcmp(&pri, &lead_pri, sizeof pri) < 0) { + if (!lead || (memcmp(&pri, &lead_pri, sizeof pri) < 0 && + lacp_partner_ready(slave))) { lead = slave; lead_pri = pri; } _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev