Hi Tobias, On Mon, Nov 30, 2020 at 03:06:08PM +0100, Tobias Waldekranz wrote: > Monitor the following events and notify the driver when: > > - A DSA port joins/leaves a LAG. > - A LAG, made up of DSA ports, joins/leaves a bridge. > - A DSA port in a LAG is enabled/disabled (enabled meaning > "distributing" in 802.3ad LACP terms). > > Each LAG interface to which a DSA port is attached is represented by a > `struct dsa_lag` which is globally reachable from the switch tree and > from each associated port. > > When a LAG joins a bridge, the DSA subsystem will treat that as each > individual port joining the bridge. The driver may look at the port's > LAG pointer to see if it is associated with any LAG, if that is > required. This is analogue to how switchdev events are replicated out > to all lower devices when reaching e.g. a LAG. > > Signed-off-by: Tobias Waldekranz <tob...@waldekranz.com> > --- > include/net/dsa.h | 97 +++++++++++++++++++++++++++++++ > net/dsa/dsa2.c | 51 ++++++++++++++++ > net/dsa/dsa_priv.h | 31 ++++++++++ > net/dsa/port.c | 141 +++++++++++++++++++++++++++++++++++++++++++++ > net/dsa/slave.c | 83 ++++++++++++++++++++++++-- > net/dsa/switch.c | 49 ++++++++++++++++ > 6 files changed, 446 insertions(+), 6 deletions(-) > > +static inline struct dsa_lag *dsa_lag_by_id(struct dsa_switch_tree *dst, int > id) > +{ > + if (!test_bit(id, dst->lags.busy)) > + return NULL; > + > + return &dst->lags.pool[id]; > +} > + > +static inline struct net_device *dsa_lag_dev_by_id(struct dsa_switch_tree > *dst, > + int id) > +{ > + struct dsa_lag *lag = dsa_lag_by_id(dst, id); > + > + return lag ? rcu_dereference(lag->dev) : NULL; > +} > + > +static inline struct dsa_lag *dsa_lag_by_dev(struct dsa_switch_tree *dst, > + struct net_device *dev) > +{ > + struct dsa_lag *lag; > + int id; > + > + dsa_lag_foreach(id, dst) { > + lag = dsa_lag_by_id(dst, id); > + > + if (rtnl_dereference(lag->dev) == dev) > + return lag; > + } > + > + return NULL; > +} > > diff --git a/net/dsa/port.c b/net/dsa/port.c > index 73569c9af3cc..c2332ee5f5c7 100644 > --- a/net/dsa/port.c > +++ b/net/dsa/port.c > @@ -193,6 +193,147 @@ void dsa_port_bridge_leave(struct dsa_port *dp, struct > net_device *br) > dsa_port_set_state_now(dp, BR_STATE_FORWARDING); > } > > +static struct dsa_lag *dsa_lag_get(struct dsa_switch_tree *dst, > + struct net_device *dev) > +{ > + struct dsa_lag *lag; > + int id; > + > + lag = dsa_lag_by_dev(dst, dev); > + if (lag) { > + kref_get(&lag->refcount); > + return lag; > + } > + > + id = find_first_zero_bit(dst->lags.busy, dst->lags.num); > + if (id >= dst->lags.num) { > + WARN(1, "No LAGs available"); > + return NULL; > + } > + > + set_bit(id, dst->lags.busy); > + > + lag = &dst->lags.pool[id]; > + kref_init(&lag->refcount); > + lag->id = id; > + INIT_LIST_HEAD(&lag->ports); > + INIT_LIST_HEAD(&lag->tx_ports); > + > + rcu_assign_pointer(lag->dev, dev); > + return lag; > +} > + > +static void dsa_lag_release(struct kref *refcount) > +{ > + struct dsa_lag *lag = container_of(refcount, struct dsa_lag, refcount); > + > + rcu_assign_pointer(lag->dev, NULL); > + synchronize_rcu(); > + memset(lag, 0, sizeof(*lag)); > +}
What difference does it make if lag->dev is set to NULL right away or after a grace period? Squeezing one last packet from that bonding interface? Pointer updates are atomic operations on all architectures that the kernel supports, and, as long as you use WRITE_ONCE and READ_ONCE memory barriers, there should be no reason for RCU protection that I can see. And unlike typical uses of RCU, you do not free lag->dev, because you do not own lag->dev. Instead, the bonding interface pointed to by lag->dev is going to be freed (in case of a deletion using ip link) after an RCU grace period anyway. And the receive data path is under an RCU read-side critical section anyway. So even if you set lag->dev to NULL using WRITE_ONCE, the existing in-flight readers from the RX data path that had called dsa_lag_dev_by_id() will still hold a reference to a valid bonding interface.