On Tue, Jan 07, 2025 at 08:01:17PM +0300, Vitaliy Makkoveev wrote: > On Tue, Jan 07, 2025 at 05:23:46PM +0100, Alexander Bluhm wrote: > > On Tue, Jan 07, 2025 at 06:21:50PM +0300, Vitaliy Makkoveev wrote: > > > > On 7 Jan 2025, at 17:25, Alexander Bluhm <bl...@openbsd.org> wrote: > > > > > > > > Hi, > > > > > > > > My daily netlink test found a crash during socket splicing. > > > > > > > > [-- MARK -- Tue Jan 7 08:05:00 2025] > > > > uvm_fault(0xffffffff828c74e8, 0x7, 0, 2) -> e > > > > kernel: page fault trap, code=2 > > > > Stopped at taskq_next_work+0x8e: movq %rdx,0x8(%rsi) > > > > TID PID UID PRFLAGS PFLAGS CPU COMMAND > > > > *213124 16048 0 0x14000 0x200 3 sosplice > > > > 204927 99709 0 0x14000 0x200 0 softnet0 > > > > taskq_next_work(ffff800000078000,ffff8000359fc4c0) at > > > > taskq_next_work+0x8e > > > > taskq_thread(ffff800000078000) at taskq_thread+0x10b > > > > end trace frame: 0x0, count: 13 > > > > https://www.openbsd.org/ddb.html describes the minimum info required in > > > > bug > > > > reports. Insufficient info makes it difficult to find and fix bugs. > > > > ddb{3}> [-- MARK -- Tue Jan 7 08:10:00 2025] > > > > > > > > I have seen it once on real hardware andd once as KVM guest. It > > > > does not happen at the first test run, but after 4 to 8 runs it may > > > > crash. Affected versions are > > > > > > > > OpenBSD 7.6-current (GENERIC.MP) #498: Mon Jan 6 12:16:01 MST 2025 > > > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > > > > > > > OpenBSD 7.6-current (GENERIC.MP) #cvs : D2025.01.07.00.00.00: Tue Jan > > > > 7 07:49:46 CET 2025 > > > > > > > > r...@ot48.obsd-lab.genua.de:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > > > > > > > > > Smells like the socket reference counting problem. I made two socket > > > related > > > diff last days. The first one which introduced new sorele() and converted > > > sosplice() task with timeout re-initialzation was committed 2024/12/30. > > > The > > > second one which switched sosplice() to shared locks was committed at > > > 2025/01/04. > > > > > > What was the last stable build? Had you try to run sosplice test with my > > > last diff reverted? > > > > The previous day it worked. But I am not sure how reliable the > > crash is. It takes several runs until it happens. > > > > I will try with reverted "Relax sockets splicing locking" commit. > > Maybe when I run specific tests many times I can make a reliable > > statement what triggered it. > > > > This adds dtrace support for sockets. Could you test crashing kernel > with this diff too?
Crash happend with IPv4 UDP splicing test with 10 parallel streams. Currently I am running the relevant subtest in a loop with reverted "Relax sockets splicing locking". No crash so far. But that does not mean much, I have to run some hours, and make the same test with current again to be sure. The btrace refcount is great find leaks. But for crashes it does not help much as the latest btrace data is not printed by userland when the kernel crashes. Anyway, OK bluhm@ for the diff below. > Index: sys/dev/dt/dt_prov_static.c > =================================================================== > RCS file: /cvs/src/sys/dev/dt/dt_prov_static.c,v > diff -u -p -r1.23 dt_prov_static.c > --- sys/dev/dt/dt_prov_static.c 6 Apr 2024 11:18:02 -0000 1.23 > +++ sys/dev/dt/dt_prov_static.c 7 Jan 2025 16:54:28 -0000 > @@ -100,6 +100,7 @@ DT_STATIC_PROBE3(refcnt, ifaddr, "void * > DT_STATIC_PROBE3(refcnt, ifmaddr, "void *", "int", "int"); > DT_STATIC_PROBE3(refcnt, inpcb, "void *", "int", "int"); > DT_STATIC_PROBE3(refcnt, rtentry, "void *", "int", "int"); > +DT_STATIC_PROBE3(refcnt, socket, "void *", "int", "int"); > DT_STATIC_PROBE3(refcnt, syncache, "void *", "int", "int"); > DT_STATIC_PROBE3(refcnt, tdb, "void *", "int", "int"); > > @@ -153,6 +154,7 @@ struct dt_probe *const dtps_static[] = { > &_DT_STATIC_P(refcnt, ifmaddr), > &_DT_STATIC_P(refcnt, inpcb), > &_DT_STATIC_P(refcnt, rtentry), > + &_DT_STATIC_P(refcnt, socket), > &_DT_STATIC_P(refcnt, syncache), > &_DT_STATIC_P(refcnt, tdb), > }; > Index: sys/kern/uipc_socket.c > =================================================================== > RCS file: /cvs/src/sys/kern/uipc_socket.c,v > diff -u -p -r1.356 uipc_socket.c > --- sys/kern/uipc_socket.c 4 Jan 2025 15:57:02 -0000 1.356 > +++ sys/kern/uipc_socket.c 7 Jan 2025 16:54:29 -0000 > @@ -154,7 +154,7 @@ soalloc(const struct protosw *prp, int w > } > #endif > > - refcnt_init(&so->so_refcnt); > + refcnt_init_trace(&so->so_refcnt, DT_REFCNT_IDX_SOCKET); > rw_init_flags(&so->so_lock, dom_name, RWL_DUPOK); > rw_init(&so->so_rcv.sb_lock, "sbufrcv"); > rw_init(&so->so_snd.sb_lock, "sbufsnd"); > Index: sys/sys/refcnt.h > =================================================================== > RCS file: /cvs/src/sys/sys/refcnt.h,v > diff -u -p -r1.12 refcnt.h > --- sys/sys/refcnt.h 28 Aug 2023 14:50:02 -0000 1.12 > +++ sys/sys/refcnt.h 7 Jan 2025 16:54:29 -0000 > @@ -49,8 +49,9 @@ unsigned int refcnt_read(struct refcnt * > #define DT_REFCNT_IDX_IFMADDR 3 > #define DT_REFCNT_IDX_INPCB 4 > #define DT_REFCNT_IDX_RTENTRY 5 > -#define DT_REFCNT_IDX_SYNCACHE 6 > -#define DT_REFCNT_IDX_TDB 7 > +#define DT_REFCNT_IDX_SOCKET 6 > +#define DT_REFCNT_IDX_SYNCACHE 7 > +#define DT_REFCNT_IDX_TDB 8 > > #endif /* _KERNEL */ >