On Wed, 2025-02-26 at 09:48 +0000, Hangbin Liu wrote: > Hi Cosmin, > On Tue, Feb 25, 2025 at 02:00:05PM +0000, Cosmin Ratiu wrote: > > This got me to stare at the code again. What if we move the removal > > of > > the xs from bond->ipsec from bond_ipsec_del_sa to > > bond_ipsec_free_sa? > > bond_ipsec_free_sa, unlike bond_ipsec_del_sa, is not called with x- > > > lock held. It is called from the xfrm gc task or directly via > > xfrm_state_put_sync and therefore wouldn't suffer from the locking > > issue. > > > > The tricky part is to make sure that inactive bond->ipsec entries > > (after bond_ipsec_del_sa calls) do not cause issues if there's a > > migration (bond_ipsec_del_sa_all is called) happening before > > bond_ipsec_free_sa. Perhaps filtering by x->km.state != > > XFRM_STATE_DEAD > > in bond_ipsec_del_sa_all. > > > > What do you think about this idea? > > Thanks a lot for the comments. I also skipped the DEAD xs in > add_sa_all. > What about the patch like:
This is what I had in mind, thanks for proposing it. Maybe you should package it in a new submission with a proper title/etc.? I'll do the initial review here. > > diff --git a/drivers/net/bonding/bond_main.c > b/drivers/net/bonding/bond_main.c > index e45bba240cbc..0e4db43a833a 100644 > --- a/drivers/net/bonding/bond_main.c > +++ b/drivers/net/bonding/bond_main.c > @@ -537,6 +537,12 @@ static void bond_ipsec_add_sa_all(struct bonding > *bond) > } > > list_for_each_entry(ipsec, &bond->ipsec_list, list) { > + /* No need to handle DEAD XFRM, as it has already > been > + * deleted and will be freed later. > + */ Nit: Maybe rephrase that as "Skip dead xfrm states, they'll be freed later." > + if (ipsec->xs->km.state == XFRM_STATE_DEAD) > + continue; > + > /* If new state is added before ipsec_lock acquired > */ > if (ipsec->xs->xso.real_dev == real_dev) > continue; > @@ -592,15 +598,6 @@ static void bond_ipsec_del_sa(struct xfrm_state > *xs) > real_dev->xfrmdev_ops->xdo_dev_state_delete(xs); > out: > netdev_put(real_dev, &tracker); > - mutex_lock(&bond->ipsec_lock); > - list_for_each_entry(ipsec, &bond->ipsec_list, list) { > - if (ipsec->xs == xs) { > - list_del(&ipsec->list); > - kfree(ipsec); > - break; > - } > - } > - mutex_unlock(&bond->ipsec_lock); > } > > static void bond_ipsec_del_sa_all(struct bonding *bond) > @@ -617,6 +614,12 @@ static void bond_ipsec_del_sa_all(struct bonding > *bond) > > mutex_lock(&bond->ipsec_lock); > list_for_each_entry(ipsec, &bond->ipsec_list, list) { > + /* No need to handle DEAD XFRM, as it has already > been > + * deleted and will be freed later. > + */ > + if (ipsec->xs->km.state == XFRM_STATE_DEAD) > + continue; > + If this doesn't free dead entries now and bond_ipsec_add_sa_all is called soon after, the pending bond_ipsec_free_sa() call will then hit the WARN_ON(xs->xso.real_dev != real_dev) before attempting to call free on the wrong device. To fix that, these entries should be freed here and the WARN_ON in bond_ipsec_free_sa() should be converted to an if...goto out, so that bond_ipsec_free_sa() calls would hit one of these conditions: 1. "if (!slave)", when no active device exists. 2. "if (!xs->xso.real_dev)", when xdo_dev_state_add() failed. 3. "if (xs->xso.real_dev != real_dev)", when a DEAD xs was already freed by bond_ipsec_del_sa_all() migration to a new device. In all 3 cases, xdo_dev_state_free() shouldn't be called, only xs removed from the bond->ipsec list. I hope I didn't miss any corner case. > if (!ipsec->xs->xso.real_dev) > continue; > > @@ -666,6 +669,16 @@ static void bond_ipsec_free_sa(struct xfrm_state > *xs) > real_dev->xfrmdev_ops->xdo_dev_state_free(xs); > out: > netdev_put(real_dev, &tracker); > + > + mutex_lock(&bond->ipsec_lock); > + list_for_each_entry(ipsec, &bond->ipsec_list, list) { > + if (ipsec->xs == xs) { > + list_del(&ipsec->list); > + kfree(ipsec); > + break; > + } > + } > + mutex_unlock(&bond->ipsec_lock); > } > > /** Cosmin.