On Wed, Mar 05, 2025 at 10:38:36AM +0200, Nikolay Aleksandrov wrote:
> > @@ -617,8 +614,18 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
> >  
> >     mutex_lock(&bond->ipsec_lock);
> >     list_for_each_entry(ipsec, &bond->ipsec_list, list) {
> 
> Second time - you should use list_for_each_entry_safe if you're walking and 
> deleting
> elements from the list.

Sorry, I missed this comment. I will update in next version.

> 
> > +           spin_lock_bh(&ipsec->xs->lock);
> >             if (!ipsec->xs->xso.real_dev)
> > -                   continue;
> > +                   goto next;
> > +
> > +           if (ipsec->xs->km.state == XFRM_STATE_DEAD) {
> > +                   /* already dead no need to delete again */
> > +                   if (real_dev->xfrmdev_ops->xdo_dev_state_free)
> > +                           
> > real_dev->xfrmdev_ops->xdo_dev_state_free(ipsec->xs);
> 
> Have you checked if .xdo_dev_state_free can sleep?
> I see at least one that can: mlx5e_xfrm_free_state().

Hmm, This brings us back to the initial problem. We tried to avoid calling
a spin lock in a sleep context (bond_ipsec_del_sa), but now the new code
encounters this issue again.

With your reply, I also checked the xdo_dev_state_add() in
bond_ipsec_add_sa_all(), which may also sleep, e.g. mlx5e_xfrm_add_state(),

If we unlock the spin lock, then the race came back again.

Any idea about this?

thanks
Hangbin

Reply via email to