On Wed, Dec 02, 2020 at 16:39, Jay Vosburgh <jay.vosbu...@canonical.com> wrote: > Tobias Waldekranz <tob...@waldekranz.com> wrote: >>I could look at hoisting up the linking op before the first >>notification. My main concern is that this is a new subsystem to me, so >>I am not sure how to determine the adequate test coverage for a change >>like this. >> >>Another option would be to drop this change from this series and do it >>separately. It would be nice to have both team and bond working though. >> >>Not sure why I am the first to run into this. Presumably the mlxsw LAG >>offloading would be affected in the same way. Maybe their main use-case >>is LACP. > > I'm not sure about mlxsw specifically, but in the configurations > I see, LACP is by far the most commonly used mode, with active-backup a > distant second. I can't recall the last time I saw a production > environment using balance-xor.
Makes sense. We (Westermo) have a few customers using static LAGs, so it does happen. That said, LACP is way more common for us as well. > I think that in the perfect world there should be exactly one > such notification, and occurring in the proper sequence. A quick look > at the kernel consumers of the NETDEV_CHANGELOWERSTATE event (mlx5, > mlxsw, and nfp, looks like) suggests that those shouldn't have an issue. > > In user space, however, there are daemons that watch the events, > and may rely on the current ordering. Some poking around reveals odd > bugs in user space when events are rearranged, so I think the prudent > thing is to not mess with what's there now, and just add the one event > here (i.e., apply your patch as-is). This is exactly the sort of thing I was worried about. Thank you so much for testing it!