On Wed, Feb 28, 2024 at 12:36:26PM +0100, Claudio Jeker wrote: > On Wed, Feb 28, 2024 at 12:26:43PM +0100, Marko Cupać wrote: > > Hi, > > > > thank you for looking into it, and for the advice. > > > > On Wed, 28 Feb 2024 10:13:06 +0000 > > Stuart Henderson <s...@spacehopper.org> wrote: > > > > > Please try to re-type at least the most important bits from a > > > screenshot so readers can quickly see which subsystems are involved. > > > > Below is manual transcript of whole screenshot, hopefully no typos. > > > > If you have any advice on what should I do if it happens again in order > > to get as much info for debuggers as possible, please let me know. > > > > splassert: assertwaitok: want 0 have 4 > > panic: kernel diagnostic assertion "p->p_wchan == NULL" failed: file > > "/usr/src/sys/kern/kern_sched.c", line 373 > > Stopped at db_enter+0x14: popq %rbp > > TID PID UID PRFLAGS PFLAGS CPU COMMAND > > 199248 36172 577 0x10 0 1 openvpn > > 490874 47446 0 0x14000 0x200 2 wg_handshake > > 71544 9311 0 0x14000 0x200 3 softnet0 > > db_enter() at db_enter+0x14 > > panic(ffffffff820a4b9f) at panic+0xc3 > > __assert(ffffffff82121fcb,ffffffff8209ae5f,175,ffffffff82092fbf) at > > assert+0x29 > > sched_chooseproc() at sched_chooseproc+0x26d > > mi_switch() at mi_switch+0x17f > > sleep_finish(0,1) at sleep_finish+0x107 > > rw_enter(ffff800008003cf0,2) at rw_enter+0x1ad > > noise_remote_ready(ffff800008003bf0) at noise_remote_ready+0x33 > > wg_qstart(fff800000a622a8) at wg_qstart+0x18c > > ifq_serialize(ffff800000a622a8,ffff800000a62390) at ifq_serialize+0xfd > > hfsc_deferred(ffff800000a62000) at hfsc_deferred+0x68 > > softclock_process_tick_timeout(ffff80000115e248,1) at > > softclock_process_tick_timeout+0xfb > > softclock(0) at softclock+0xb8 > > softintr_dispatch(0) at softintr_dispatch+0xeb > > end trace frame: 0xffff800020dbc730, count:0 > > > > WTF! wg(4) is just broken. How the hell should a sleeping rw_lock work > when called from inside a timeout aka softclock? This is interrupt context > code is not allowed to sleep there. >
Not only wg(4). Depends on interface queue usage, ifq_start() schedules (*if_qstart)() or calls it, so all the interfaces with use rwlock(9) in (*if_qstart)() handler are in risk. What about to always schedule (*if_qstart)()? Index: sys/net/hfsc.c =================================================================== RCS file: /cvs/src/sys/net/hfsc.c,v retrieving revision 1.49 diff -u -p -r1.49 hfsc.c --- sys/net/hfsc.c 11 Apr 2023 00:45:09 -0000 1.49 +++ sys/net/hfsc.c 28 Feb 2024 13:15:22 -0000 @@ -953,8 +953,7 @@ hfsc_deferred(void *arg) if (!HFSC_ENABLED(ifq)) return; - if (!ifq_empty(ifq)) - ifq_start(ifq); + ifq_start_deferred(ifq); hif = ifq_q_enter(&ifp->if_snd, ifq_hfsc_ops); if (hif == NULL) Index: sys/net/ifq.c =================================================================== RCS file: /cvs/src/sys/net/ifq.c,v retrieving revision 1.53 diff -u -p -r1.53 ifq.c --- sys/net/ifq.c 10 Nov 2023 15:51:24 -0000 1.53 +++ sys/net/ifq.c 28 Feb 2024 13:15:22 -0000 @@ -133,6 +133,12 @@ ifq_start(struct ifqueue *ifq) } else task_add(ifq->ifq_softnet, &ifq->ifq_bundle); } +void +ifq_start_deferred(struct ifqueue *ifq) +{ + if (ifq_len(ifq)) + task_add(ifq->ifq_softnet, &ifq->ifq_bundle); +} void ifq_start_task(void *p) Index: sys/net/ifq.h =================================================================== RCS file: /cvs/src/sys/net/ifq.h,v retrieving revision 1.41 diff -u -p -r1.41 ifq.h --- sys/net/ifq.h 10 Nov 2023 15:51:24 -0000 1.41 +++ sys/net/ifq.h 28 Feb 2024 13:15:22 -0000 @@ -430,6 +430,7 @@ void ifq_destroy(struct ifqueue *); void ifq_add_data(struct ifqueue *, struct if_data *); int ifq_enqueue(struct ifqueue *, struct mbuf *); void ifq_start(struct ifqueue *); +void ifq_start_deferred(struct ifqueue *); struct mbuf *ifq_deq_begin(struct ifqueue *); void ifq_deq_commit(struct ifqueue *, struct mbuf *); void ifq_deq_rollback(struct ifqueue *, struct mbuf *);