> Date: Tue, 25 Jan 2022 10:00:45 +0100 > From: Stefan Sperling <[email protected]> > > On Tue, Jan 25, 2022 at 09:32:21AM +0100, Mark Kettenis wrote: > > Happened again while still on a Jan 16 snapshot kernel. So it is not > > related to that diff. > > > > Here is the panic message and backtrace: > > > > panic: kernel diagnostic assertion "sc->task_refs.refs == 0" failed: file > > "/usr/src/sys/dev/pci/if_iwm.c", line 9981 > > Stopped at db_enter+0x10: popq %rbp > > TID PID UID PRFLAGS PFLAGS CPU COMMAND > > *120744 85293 0 0x3 0 0K ifconfig > > db_enter() at db_enter+0x10 > > panic() at panic+0xbf > > __assert() at __assert+0x25 > > iwm_init() at iwm_init+0x254 > > iwm_ioctl() at iwm_ioctl+0xf9 > > ifioctl() at ifioctl+0x92b > > soo_ioctl() at soo_ioctl+0x161 > > sys_ioctl() at sys_ioctl+0x2c4 > > syscall() at syscall+0x374 > > Xsyscall() at Xsyscall+0x128 > > end of kernel > > > > Please try this patch. > > Upon resume we, set the task ref count to 1 in anticipation of > the newstate task that will be triggered to move into SCAN state. > iwm_add_task would bump the refcount to 2. The task would decrease > refcount again when it is done, and refcnt_finalize() in iwm_stop() > would eventually let the counter drop back to zero. > > Now, for some reason on your system the device is not responding to > the driver's attempt to claim ownership. This looks like maybe some > problem with the bus the device is attached to. I am not in a position > to debug that issue, perhaps you could try? In any case, iwx_wakeup() > errors out early, with task refcount 1 but with no task scheduled and > IFF_RUNNING not set (meaning ioctl will not call iwm_stop()).
Yes, something still seems to be not 100% in the resume path. I'll see what I can do, but it happens infrequently. It may actually have started with the drm update. So maybe it is just a timing issue that happens because something in drm holds the kernel lock a bit longer during resume... > Later, you run ifconfig, the ioctl handler runs, and calls iwm_init(). > This function expects that we are in a clean initial state (as after boot), > such that iwm_stop() was called beforehand to clear out any tasks, dropping > the task ref counter back to zero. > > The KASSERT triggers but for the wrong reason: We don't have outstanding > tasks, we have a bad reference counter. Only setting the ref counter to 1 if > we are about to launch a task during resume should fix it, and this matches > what iwx(4) is doing: Running this now. May take some time to reproduce the issue though. > diff d26399562c831a7212cebc57463cc9931ff8aff2 /usr/src > blob - 937f2cc28f6c85502031e4c9efa0a02c75fd1a6d > file + sys/dev/pci/if_iwm.c > --- sys/dev/pci/if_iwm.c > +++ sys/dev/pci/if_iwm.c > @@ -11719,8 +11719,6 @@ iwm_wakeup(struct iwm_softc *sc) > struct ifnet *ifp = &sc->sc_ic.ic_if; > int err; > > - refcnt_init(&sc->task_refs); > - > err = iwm_start_hw(sc); > if (err) > return err; > @@ -11729,6 +11727,7 @@ iwm_wakeup(struct iwm_softc *sc) > if (err) > return err; > > + refcnt_init(&sc->task_refs); > ifq_clr_oactive(&ifp->if_snd); > ifp->if_flags |= IFF_RUNNING; > >
