On 06/08/14 05:44, Ethan Jackson wrote:
Based on my (long ago) reading of the LACP spec, only supporting a
single aggregator is a valid configuration. Furthermore, it's what
makes the most sense given the structure of the OVS bonding
configuration. I'd really rather not make a non standard change to
the protocol to support a buggy upstream mlag implementation cause I
don't know how it could affect other less buggy switches. My
preferences is to shelve this for now FWIW.
I don't think there is too much to loose. This change just gives a
preference for slaves which actually claim they are willing to send and
receive traffic. I don't see how that could go wrong.
Also, this fix just makes life easier in a not really standard defined
situation, and makes the behaviour more similar to the kernel bonding
driver.
Ethan
On Tue, Aug 5, 2014 at 2:16 PM, Flavio Leitner <f...@redhat.com> wrote:
On Mon, Aug 04, 2014 at 12:08:48PM -0700, Andy Zhou wrote:
Zoltan,
Sorry it took a while to get back to you. I am just coming up to
speed on OVS LACP implementation, so my understanding may not be
correct. Please feel free to point them out If I am wrong.
According to wikipeida MC-LAG entry, there is no standard for it, they
are mostly designed and implemented by vendors.
After reading through the commit message, and comparing with the
802.1AX spec, I feel this seems like there is a bug in the MC-LAG
implementation/configuration issue. When the partner on port A comes
back again, should it wait for MC-LAG sync before using the default
profile to exchange states with OVS?
I agree that it sounds like a problem in the MC-LAG. However, I also
agree that OVS could do better.
The aggregation selection policy is somewhat a gray area not defined
in any spec. The bonding driver offers ad_select= parameter which
allows to switch to the new aggregator only if, for instance, all the
ports are down in an active aggregator.
The Team driver implementing 802.3ad also provides the policy selection
parameter. The default is to consider the prio in the LACPDU, but you
can also tell to not select any other aggregator if the current one is
still usable, or per bandwidth or per number of ports available.
My suggestion if we want to change something is to stick with bonding
driver default behavior regarding to select a new aggregator:
"""
table or 0
The active aggregator is chosen by largest aggregate
bandwidth.
Reselection of the active aggregator occurs only when all
slaves of the active aggregator are down or the active
aggregator has no slaves.
This is the default value.
"""
Documentation/networking/bonding.txt
That would avoid problems with transient states like the reported one.
fbl
On Mon, Jul 14, 2014 at 3:11 PM, Ben Pfaff <b...@nicira.com> wrote:
On Tue, Jul 08, 2014 at 05:35:57PM +0100, Zoltan Kiss wrote:
This patch modifies the LACP selection logic by prefering a slaves with up and
running partners when looking for a lead.
That fixes the following scenario:
- bond has 2 ports, A and B, their other ends are in separate chassis with
MC-LAG sync
- the partner of port A is restarted
- port B is still working
- the partner on port A comes back, but temporarily it is using a default
config, as MC-LAG haven't synced yet
- apparently that default config has a sys_priority which is smaller than the
other, still running port, plus completely different sys_id
- therefore OVS choose port A despite it won't ever comes up into
collecting-distributing state
- and port B is disabled, causing the whole bond goes down
Checking through the 802.1ax standard, when port A comes up again, the two
links fall apart due to the different LAG IDs. They should be attached to
different Aggregators, and the Aggregators should live separately. In OVS there
is no such concept as Aggregator, but I think it should be said that it has only
one Aggregator, and it has an unique policy to choose which ports can join.
Although changing the chassis' default config can also fix this, detecting
such problems quite hard, therefore I think it is still valid to improve things
in OVS side.
Btw. the Linux kernel bonding drivers' LACP implementation allows more
aggregators, and therefore it could handle this situation properly.
Signed-off-by: Zoltan Kiss <zoltan.k...@citrix.com>
I verified that the unit tests still pass with this applied.
Andy Zhou said he'd review the patch.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev