Hello,

On Sat, Dec 09, 2023 at 03:10:02AM +0300, Vitaliy Makkoveev wrote:
> On Sat, Dec 09, 2023 at 12:28:10AM +0100, Alexander Bluhm wrote:
> > On Sat, Dec 09, 2023 at 02:07:06AM +0300, Vitaliy Makkoveev wrote:
> > > > >       SLIST_ENTRY(pflow_softc) sc_next;
> > > > 
> > > > This list is protected by net lock.  Can you add an [N] here?
> > > > 
> > > 
> > > This is not true. The netlock is not taken while export_pflow() called
> > > from pf_purge_states(). I privately shared the diff to fix this, but not
> > > committed it yet. I will update it and share after committing this diff.
> > 
> > Why not hold the shared net lock while in pf?
> > 
> 
> Because the export_flow() should be called only if the PFSTATE_PFLOW
> flag is set? The netlock should be taken before PF_LOCK, so there is no
> reason to always take it in this path only to make optional
> export_pflow() call:
> 
> pf_purge_expired_states(const unsigned int limit, const unsigned int collect)
> {
>         /* skip */
> 
>         rw_enter_write(&pf_state_list.pfs_rwl);
>         PF_LOCK();
>         PF_STATE_ENTER_WRITE();
>         SLIST_FOREACH(st, &gcl, gc_list) {
>                 if (st->timeout != PFTM_UNLINKED)
>                         pf_remove_state(st);
> 
>                 pf_free_state(st);
>         }
> 
> pf_remove_state(struct pf_state *st)
> {                       
>         PF_ASSERT_LOCKED();
>       
>         /* skip */
>  
>         RBT_REMOVE(pf_state_tree_id, &tree_id, st);
> #if NPFLOW > 0
>         if (st->state_flags & PFSTATE_PFLOW)
>                 export_pflow(st);
> 
> However, I'm not the pf hacker, I's better to ask sashan@ and dlg@.
> 

    dlg@ and I are basically trying to remove all NET_LOCK() operations from
    pf(4), because we don't want pf(4) to be playing with global NET_LOCK().
    all callers to pf(4) should either obtain NET_LOCK() in case they need it.
    pf(4) should not care about NET_LOCK() at all. That's the ideal situation
    where we are heading to.

    I took a closer look at export_pflow() in current. It seems to me the
    function assumes caller holds a shared NET_LOCK() at least, just to protect
    consistency of `pflowif_list`. There should be an ASSERT() for this  and
    pf(4) indeed should make sure to call export_pflow() with all locks it
    needs obtained in right order. Because the ASSERT() was missing the bugs
    got discovered the hard way. I certainly don't mind to add NET_LOCK()
    into pf_purge_expired_states() to keep export_pflow() happy.

    on the other hand if there is a way to implement pflowif_list as lock-less
    (or move it ouf of NET_LOCK() scope), then this is a preferred way
    forward.

thanks and
regards
sashan

Reply via email to