Toni Peltonen <pel...@peltzi.fi> wrote: >Previously when unbinding a slave the 802.3ad implementation only told >partner that the port is not suitable for aggregation by setting the port >aggregation state from aggregatable to individual. This is not enough. If the >physical layer still stays up and we only unbinded this port from the bond >there >is nothing in the aggregation status alone to prevent the partner from sending >traffic towards us. To ensure that the partner doesn't consider this >port at all anymore we should also disable collecting and distributing to >signal that this actor is going away. > >I have tested this behaviour againts Arista EOS switches with mlx5 cards >(physical link stays up when even when interface is down) and simulated >the same situation virtually Linux <-> Linux with two network namespaces >running two veth device pairs. In both cases setting aggregation to >individual doesn't alone prevent traffic from being to sent towards this >port given that the link stays up in partners end. Partner still keeps >it's end in collecting + distributing state and continues until timeout is >reached. In most cases this means we are losing the traffic partner sends >towards our port while we wait for timeout. This is most visible with slow >periodic time (LAPC rate slow).
"LAPC" -> "LACP" >Other open source implementations like Open VSwitch and libreswitch, and >vendor implementations like Arista EOS, seem to disable collecting + >distributing to when doing similar port disabling/detaching/removing change. >With this patch kernel implementation would behave the same way and ensure >partner doesn't consider our actor viable anymore. After re-reading the relevant bits of 802.1AX (particularly 5.4.9 on recordPDU and update_Selected) I'm going to suggest also clearing AD_STATE_SYNCHRONIZATION, based on: Partner_Oper_Port_State.Synchronization is also set to TRUE if the value of Actor_State.Aggregation in the received PDU is set to FALSE (i.e., indicates an Individual link), Actor_State.Synchronization in the received PDU is set to TRUE, and LACP will actively maintain the link. Per the above, learing _SYNC in the LACPDU should un-sync the port, inducing the Mux state machine (figure 5-15) to exit C_D state and go to ATTACHED state (disabling Coll/Dist). But, either way, as this is a hint to get the link partner to stop using the port, this looks reasonable to me. Acked-by: Jay Vosburgh <jay.vosbu...@canonical.com> -J >Signed-off-by: Toni Peltonen <pel...@peltzi.fi> >--- > drivers/net/bonding/bond_3ad.c | 2 ++ > 1 file changed, 2 insertions(+) > >diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c >index f43fb2f958a5..6776c33753dc 100644 >--- a/drivers/net/bonding/bond_3ad.c >+++ b/drivers/net/bonding/bond_3ad.c >@@ -2086,6 +2086,8 @@ void bond_3ad_unbind_slave(struct slave *slave) > aggregator->aggregator_identifier); > > /* Tell the partner that this port is not suitable for aggregation */ >+ port->actor_oper_port_state &= ~AD_STATE_COLLECTING; >+ port->actor_oper_port_state &= ~AD_STATE_DISTRIBUTING; > port->actor_oper_port_state &= ~AD_STATE_AGGREGATION; > __update_lacpdu_from_port(port); > ad_lacpdu_send(port); >-- >2.19.0 --- -Jay Vosburgh, jay.vosbu...@canonical.com