Hi Cosmin, On Mon, Apr 07, 2025 at 04:35:42PM +0300, Cosmin Ratiu wrote: > Refactor the bonding ipsec offload operations to fix a number of > long-standing control plane races between state migration and user > deletion and a few other issues. > > xfrm state deletion can happen concurrently with > bond_change_active_slave() operation. This manifests itself as a > bond_ipsec_del_sa() call with x->lock held, followed by a > bond_ipsec_free_sa() a bit later from a wq. The alternate path of > these calls coming from xfrm_dev_state_flush() can't happen, as that > needs the RTNL lock and bond_change_active_slave() already holds it. > > 1. bond_ipsec_del_sa_all() might call xdo_dev_state_delete() a second > time on an xfrm state that was concurrently killed. This is bad. > 2. bond_ipsec_add_sa_all() can add a state on the new device, but > pending bond_ipsec_free_sa() calls from the old device will then hit > the WARN_ON() and then, worse, call xdo_dev_state_free() on the new > device without a corresponding xdo_dev_state_delete(). > 3. Resolve a sleeping in atomic context introduced by the mentioned > "Fixes" commit. > > bond_ipsec_del_sa_all() and bond_ipsec_add_sa_all() now acquire x->lock > and check for x->km.state to help with problems 1 and 2. And since > xso.real_dev is now a private pointer managed by the bonding driver in > xfrm state, make better use of it to fully fix problems 1 and 2. In > bond_ipsec_del_sa_all(), set xso.real_dev to NULL while holding both the > mutex and x->lock, which makes sure that neither bond_ipsec_del_sa() nor > bond_ipsec_free_sa() could run concurrently. > > Fix problem 3 by moving the list cleanup (which requires the mutex) from > bond_ipsec_del_sa() (called from atomic context) to bond_ipsec_free_sa() > > Finally, simplify bond_ipsec_free_sa() by not using current_active_slave > at all, because now that xso.real_dev is protected by locks it can be > trusted to always reflect the offload device. > > Fixes: 2aeeef906d5a ("bonding: change ipsec_lock from spin lock to mutex") > Signed-off-by: Cosmin Ratiu <cra...@nvidia.com> > Reviewed-by: Leon Romanovsky <leo...@nvidia.com> > --- > drivers/net/bonding/bond_main.c | 58 +++++++++++++++++++-------------- > 1 file changed, 33 insertions(+), 25 deletions(-) > > diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c > index 443624504767..ede3287318f8 100644 > --- a/drivers/net/bonding/bond_main.c > +++ b/drivers/net/bonding/bond_main.c > @@ -544,7 +544,20 @@ static void bond_ipsec_add_sa_all(struct bonding *bond) > slave_warn(bond_dev, real_dev, "%s: failed to add > SA\n", __func__); > continue; > } > + > + spin_lock_bh(&ipsec->xs->lock); > + /* xs might have been killed by the user during the migration > + * to the new dev, but bond_ipsec_del_sa() should have done > + * nothing, as xso.real_dev is NULL. > + * Delete it from the device we just added it to. The pending > + * bond_ipsec_free_sa() call will do the rest of the cleanup. > + */ > + if (ipsec->xs->km.state == XFRM_STATE_DEAD && > + real_dev->xfrmdev_ops->xdo_dev_state_delete) > + real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev, > + ipsec->xs); > ipsec->xs->xso.real_dev = real_dev; > + spin_unlock_bh(&ipsec->xs->lock); > } > out: > mutex_unlock(&bond->ipsec_lock); > @@ -559,7 +572,6 @@ static void bond_ipsec_del_sa(struct net_device *bond_dev, > { > struct net_device *real_dev; > netdevice_tracker tracker; > - struct bond_ipsec *ipsec; > struct bonding *bond; > struct slave *slave; > > @@ -591,15 +603,6 @@ static void bond_ipsec_del_sa(struct net_device > *bond_dev, > real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev, xs);
Thanks a lot for the fixes. With your patch applied. I see the bond_ipsec_del_sa() still has WARN_ON(xs->xso.real_dev != real_dev); Do you think if we still has this possibility? If yes, should we do xdo_dev_state_delete() on xs->xso.real_dev or real_dev? Thanks Hangbin