On Tue, Jan 07, 2025 at 05:23:46PM +0100, Alexander Bluhm wrote: > On Tue, Jan 07, 2025 at 06:21:50PM +0300, Vitaliy Makkoveev wrote: > > > On 7 Jan 2025, at 17:25, Alexander Bluhm <bl...@openbsd.org> wrote: > > > > > > Hi, > > > > > > My daily netlink test found a crash during socket splicing. > > > > > > [-- MARK -- Tue Jan 7 08:05:00 2025] > > > uvm_fault(0xffffffff828c74e8, 0x7, 0, 2) -> e > > > kernel: page fault trap, code=2 > > > Stopped at taskq_next_work+0x8e: movq %rdx,0x8(%rsi) > > > TID PID UID PRFLAGS PFLAGS CPU COMMAND > > > *213124 16048 0 0x14000 0x200 3 sosplice > > > 204927 99709 0 0x14000 0x200 0 softnet0 > > > taskq_next_work(ffff800000078000,ffff8000359fc4c0) at taskq_next_work+0x8e > > > taskq_thread(ffff800000078000) at taskq_thread+0x10b > > > end trace frame: 0x0, count: 13 > > > https://www.openbsd.org/ddb.html describes the minimum info required in > > > bug > > > reports. Insufficient info makes it difficult to find and fix bugs. > > > ddb{3}> [-- MARK -- Tue Jan 7 08:10:00 2025] > > > > > > I have seen it once on real hardware andd once as KVM guest. It > > > does not happen at the first test run, but after 4 to 8 runs it may > > > crash. Affected versions are > > > > > > OpenBSD 7.6-current (GENERIC.MP) #498: Mon Jan 6 12:16:01 MST 2025 > > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > > > > > OpenBSD 7.6-current (GENERIC.MP) #cvs : D2025.01.07.00.00.00: Tue Jan 7 > > > 07:49:46 CET 2025 > > > r...@ot48.obsd-lab.genua.de:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > > > > > > Smells like the socket reference counting problem. I made two socket related > > diff last days. The first one which introduced new sorele() and converted > > sosplice() task with timeout re-initialzation was committed 2024/12/30. The > > second one which switched sosplice() to shared locks was committed at > > 2025/01/04. > > > > What was the last stable build? Had you try to run sosplice test with my > > last diff reverted? > > The previous day it worked. But I am not sure how reliable the > crash is. It takes several runs until it happens. > > I will try with reverted "Relax sockets splicing locking" commit. > Maybe when I run specific tests many times I can make a reliable > statement what triggered it. >
This adds dtrace support for sockets. Could you test crashing kernel with this diff too? Index: sys/dev/dt/dt_prov_static.c =================================================================== RCS file: /cvs/src/sys/dev/dt/dt_prov_static.c,v diff -u -p -r1.23 dt_prov_static.c --- sys/dev/dt/dt_prov_static.c 6 Apr 2024 11:18:02 -0000 1.23 +++ sys/dev/dt/dt_prov_static.c 7 Jan 2025 16:54:28 -0000 @@ -100,6 +100,7 @@ DT_STATIC_PROBE3(refcnt, ifaddr, "void * DT_STATIC_PROBE3(refcnt, ifmaddr, "void *", "int", "int"); DT_STATIC_PROBE3(refcnt, inpcb, "void *", "int", "int"); DT_STATIC_PROBE3(refcnt, rtentry, "void *", "int", "int"); +DT_STATIC_PROBE3(refcnt, socket, "void *", "int", "int"); DT_STATIC_PROBE3(refcnt, syncache, "void *", "int", "int"); DT_STATIC_PROBE3(refcnt, tdb, "void *", "int", "int"); @@ -153,6 +154,7 @@ struct dt_probe *const dtps_static[] = { &_DT_STATIC_P(refcnt, ifmaddr), &_DT_STATIC_P(refcnt, inpcb), &_DT_STATIC_P(refcnt, rtentry), + &_DT_STATIC_P(refcnt, socket), &_DT_STATIC_P(refcnt, syncache), &_DT_STATIC_P(refcnt, tdb), }; Index: sys/kern/uipc_socket.c =================================================================== RCS file: /cvs/src/sys/kern/uipc_socket.c,v diff -u -p -r1.356 uipc_socket.c --- sys/kern/uipc_socket.c 4 Jan 2025 15:57:02 -0000 1.356 +++ sys/kern/uipc_socket.c 7 Jan 2025 16:54:29 -0000 @@ -154,7 +154,7 @@ soalloc(const struct protosw *prp, int w } #endif - refcnt_init(&so->so_refcnt); + refcnt_init_trace(&so->so_refcnt, DT_REFCNT_IDX_SOCKET); rw_init_flags(&so->so_lock, dom_name, RWL_DUPOK); rw_init(&so->so_rcv.sb_lock, "sbufrcv"); rw_init(&so->so_snd.sb_lock, "sbufsnd"); Index: sys/sys/refcnt.h =================================================================== RCS file: /cvs/src/sys/sys/refcnt.h,v diff -u -p -r1.12 refcnt.h --- sys/sys/refcnt.h 28 Aug 2023 14:50:02 -0000 1.12 +++ sys/sys/refcnt.h 7 Jan 2025 16:54:29 -0000 @@ -49,8 +49,9 @@ unsigned int refcnt_read(struct refcnt * #define DT_REFCNT_IDX_IFMADDR 3 #define DT_REFCNT_IDX_INPCB 4 #define DT_REFCNT_IDX_RTENTRY 5 -#define DT_REFCNT_IDX_SYNCACHE 6 -#define DT_REFCNT_IDX_TDB 7 +#define DT_REFCNT_IDX_SOCKET 6 +#define DT_REFCNT_IDX_SYNCACHE 7 +#define DT_REFCNT_IDX_TDB 8 #endif /* _KERNEL */