On Tue, Aug 05, 2014 at 09:44:40PM -0700, Ethan Jackson wrote: > Based on my (long ago) reading of the LACP spec, only supporting a > single aggregator is a valid configuration. Furthermore, it's what
It is, no questions about that. > makes the most sense given the structure of the OVS bonding > configuration. I'd really rather not make a non standard change to > the protocol to support a buggy upstream mlag implementation cause I > don't know how it could affect other less buggy switches. My > preferences is to shelve this for now FWIW. Just to make it clear, I am talking about re-selection of aggregator and that is not specified in any standard, so it's not a non standard change. A real life example is the bonding driver that is available for years doing that by default without issues. I am just commenting on this RFC anyway, I have no intention to push any patch for that now. fbl > > Ethan > > On Tue, Aug 5, 2014 at 2:16 PM, Flavio Leitner <f...@redhat.com> wrote: > > On Mon, Aug 04, 2014 at 12:08:48PM -0700, Andy Zhou wrote: > >> Zoltan, > >> > >> Sorry it took a while to get back to you. I am just coming up to > >> speed on OVS LACP implementation, so my understanding may not be > >> correct. Please feel free to point them out If I am wrong. > >> > >> According to wikipeida MC-LAG entry, there is no standard for it, they > >> are mostly designed and implemented by vendors. > >> > >> After reading through the commit message, and comparing with the > >> 802.1AX spec, I feel this seems like there is a bug in the MC-LAG > >> implementation/configuration issue. When the partner on port A comes > >> back again, should it wait for MC-LAG sync before using the default > >> profile to exchange states with OVS? > > > > I agree that it sounds like a problem in the MC-LAG. However, I also > > agree that OVS could do better. > > > > The aggregation selection policy is somewhat a gray area not defined > > in any spec. The bonding driver offers ad_select= parameter which > > allows to switch to the new aggregator only if, for instance, all the > > ports are down in an active aggregator. > > > > The Team driver implementing 802.3ad also provides the policy selection > > parameter. The default is to consider the prio in the LACPDU, but you > > can also tell to not select any other aggregator if the current one is > > still usable, or per bandwidth or per number of ports available. > > > > My suggestion if we want to change something is to stick with bonding > > driver default behavior regarding to select a new aggregator: > > """ > > table or 0 > > > > The active aggregator is chosen by largest aggregate > > bandwidth. > > > > Reselection of the active aggregator occurs only when all > > slaves of the active aggregator are down or the active > > aggregator has no slaves. > > > > This is the default value. > > """ > > Documentation/networking/bonding.txt > > > > That would avoid problems with transient states like the reported one. > > > > fbl > > > >> On Mon, Jul 14, 2014 at 3:11 PM, Ben Pfaff <b...@nicira.com> wrote: > >> > On Tue, Jul 08, 2014 at 05:35:57PM +0100, Zoltan Kiss wrote: > >> >> This patch modifies the LACP selection logic by prefering a slaves with > >> >> up and > >> >> running partners when looking for a lead. > >> >> That fixes the following scenario: > >> >> - bond has 2 ports, A and B, their other ends are in separate chassis > >> >> with > >> >> MC-LAG sync > >> >> - the partner of port A is restarted > >> >> - port B is still working > >> >> - the partner on port A comes back, but temporarily it is using a > >> >> default > >> >> config, as MC-LAG haven't synced yet > >> >> - apparently that default config has a sys_priority which is smaller > >> >> than the > >> >> other, still running port, plus completely different sys_id > >> >> - therefore OVS choose port A despite it won't ever comes up into > >> >> collecting-distributing state > >> >> - and port B is disabled, causing the whole bond goes down > >> >> > >> >> Checking through the 802.1ax standard, when port A comes up again, the > >> >> two > >> >> links fall apart due to the different LAG IDs. They should be attached > >> >> to > >> >> different Aggregators, and the Aggregators should live separately. In > >> >> OVS there > >> >> is no such concept as Aggregator, but I think it should be said that it > >> >> has only > >> >> one Aggregator, and it has an unique policy to choose which ports can > >> >> join. > >> >> Although changing the chassis' default config can also fix this, > >> >> detecting > >> >> such problems quite hard, therefore I think it is still valid to > >> >> improve things > >> >> in OVS side. > >> >> Btw. the Linux kernel bonding drivers' LACP implementation allows more > >> >> aggregators, and therefore it could handle this situation properly. > >> >> > >> >> Signed-off-by: Zoltan Kiss <zoltan.k...@citrix.com> > >> > > >> > I verified that the unit tests still pass with this applied. > >> > > >> > Andy Zhou said he'd review the patch. > >> _______________________________________________ > >> dev mailing list > >> dev@openvswitch.org > >> http://openvswitch.org/mailman/listinfo/dev > >> > > _______________________________________________ > > dev mailing list > > dev@openvswitch.org > > http://openvswitch.org/mailman/listinfo/dev > _______________________________________________ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev > _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev