> Date: Tue, 25 Jan 2022 10:00:45 +0100
> From: Stefan Sperling <[email protected]>
> 
> On Tue, Jan 25, 2022 at 09:32:21AM +0100, Mark Kettenis wrote:
> > Happened again while still on a Jan 16 snapshot kernel.  So it is not
> > related to that diff.
> > 
> > Here is the panic message and backtrace:
> > 
> > panic: kernel diagnostic assertion "sc->task_refs.refs == 0" failed: file 
> > "/usr/src/sys/dev/pci/if_iwm.c", line 9981
> > Stopped at      db_enter+0x10:  popq    %rbp
> >     TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
> > *120744  85293      0         0x3          0    0K ifconfig
> > db_enter() at db_enter+0x10
> > panic() at panic+0xbf
> > __assert() at __assert+0x25
> > iwm_init() at iwm_init+0x254
> > iwm_ioctl() at iwm_ioctl+0xf9
> > ifioctl() at ifioctl+0x92b
> > soo_ioctl() at soo_ioctl+0x161
> > sys_ioctl() at sys_ioctl+0x2c4
> > syscall() at syscall+0x374
> > Xsyscall() at Xsyscall+0x128
> > end of kernel
> > 
> 
> Please try this patch.
> 
> Upon resume we, set the task ref count to 1 in anticipation of
> the newstate task that will be triggered to move into SCAN state.
> iwm_add_task would bump the refcount to 2. The task would decrease
> refcount again when it is done, and refcnt_finalize() in iwm_stop()
> would eventually let the counter drop back to zero.
> 
> Now, for some reason on your system the device is not responding to
> the driver's attempt to claim ownership. This looks like maybe some
> problem with the bus the device is attached to. I am not in a position
> to debug that issue, perhaps you could try? In any case, iwx_wakeup()
> errors out early, with task refcount 1 but with no task scheduled and
> IFF_RUNNING not set (meaning ioctl will not call iwm_stop()).

Yes, something still seems to be not 100% in the resume path.  I'll
see what I can do, but it happens infrequently.

It may actually have started with the drm update.  So maybe it is just
a timing issue that happens because something in drm holds the kernel
lock a bit longer during resume...

> Later, you run ifconfig, the ioctl handler runs, and calls iwm_init().
> This function expects that we are in a clean initial state (as after boot),
> such that iwm_stop() was called beforehand to clear out any tasks, dropping
> the task ref counter back to zero.
> 
> The KASSERT triggers but for the wrong reason: We don't have outstanding
> tasks, we have a bad reference counter. Only setting the ref counter to 1 if
> we are about to launch a task during resume should fix it, and this matches
> what iwx(4) is doing:

Running this now.  May take some time to reproduce the issue though.

> diff d26399562c831a7212cebc57463cc9931ff8aff2 /usr/src
> blob - 937f2cc28f6c85502031e4c9efa0a02c75fd1a6d
> file + sys/dev/pci/if_iwm.c
> --- sys/dev/pci/if_iwm.c
> +++ sys/dev/pci/if_iwm.c
> @@ -11719,8 +11719,6 @@ iwm_wakeup(struct iwm_softc *sc)
>       struct ifnet *ifp = &sc->sc_ic.ic_if;
>       int err;
>  
> -     refcnt_init(&sc->task_refs);
> -
>       err = iwm_start_hw(sc);
>       if (err)
>               return err;
> @@ -11729,6 +11727,7 @@ iwm_wakeup(struct iwm_softc *sc)
>       if (err)
>               return err;
>  
> +     refcnt_init(&sc->task_refs);
>       ifq_clr_oactive(&ifp->if_snd);
>       ifp->if_flags |= IFF_RUNNING;
>  
> 

Reply via email to