On Tue, Jan 07, 2025 at 05:23:46PM +0100, Alexander Bluhm wrote:
> On Tue, Jan 07, 2025 at 06:21:50PM +0300, Vitaliy Makkoveev wrote:
> > > On 7 Jan 2025, at 17:25, Alexander Bluhm <bl...@openbsd.org> wrote:
> > > 
> > > Hi,
> > > 
> > > My daily netlink test found a crash during socket splicing.
> > > 
> > > [-- MARK -- Tue Jan  7 08:05:00 2025]
> > > uvm_fault(0xffffffff828c74e8, 0x7, 0, 2) -> e
> > > kernel: page fault trap, code=2
> > > Stopped at      taskq_next_work+0x8e:   movq    %rdx,0x8(%rsi)
> > >    TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
> > > *213124  16048      0     0x14000      0x200    3  sosplice
> > > 204927  99709      0     0x14000      0x200    0  softnet0
> > > taskq_next_work(ffff800000078000,ffff8000359fc4c0) at taskq_next_work+0x8e
> > > taskq_thread(ffff800000078000) at taskq_thread+0x10b
> > > end trace frame: 0x0, count: 13
> > > https://www.openbsd.org/ddb.html describes the minimum info required in 
> > > bug
> > > reports.  Insufficient info makes it difficult to find and fix bugs.
> > > ddb{3}> [-- MARK -- Tue Jan  7 08:10:00 2025]
> > > 
> > > I have seen it once on real hardware andd once as KVM guest.  It
> > > does not happen at the first test run, but after 4 to 8 runs it may
> > > crash.  Affected versions are
> > > 
> > > OpenBSD 7.6-current (GENERIC.MP) #498: Mon Jan  6 12:16:01 MST 2025
> > >    dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > > 
> > > OpenBSD 7.6-current (GENERIC.MP) #cvs : D2025.01.07.00.00.00: Tue Jan  7 
> > > 07:49:46 CET 2025
> > >    r...@ot48.obsd-lab.genua.de:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > > 
> > 
> > Smells like the socket reference counting problem. I made two socket related
> > diff last days. The first one which introduced new sorele() and converted
> > sosplice() task with timeout re-initialzation was committed 2024/12/30. The
> > second one which switched sosplice() to shared locks was committed at
> > 2025/01/04.
> > 
> > What was the last stable build? Had you try to run sosplice test with my
> > last diff reverted?
> 
> The previous day it worked.  But I am not sure how reliable the
> crash is.  It takes several runs until it happens.
> 
> I will try with reverted "Relax sockets splicing locking" commit.
> Maybe when I run specific tests many times I can make a reliable
> statement what triggered it.
> 

This adds dtrace support for sockets. Could you test crashing kernel
with this diff too?

Index: sys/dev/dt/dt_prov_static.c
===================================================================
RCS file: /cvs/src/sys/dev/dt/dt_prov_static.c,v
diff -u -p -r1.23 dt_prov_static.c
--- sys/dev/dt/dt_prov_static.c 6 Apr 2024 11:18:02 -0000       1.23
+++ sys/dev/dt/dt_prov_static.c 7 Jan 2025 16:54:28 -0000
@@ -100,6 +100,7 @@ DT_STATIC_PROBE3(refcnt, ifaddr, "void *
 DT_STATIC_PROBE3(refcnt, ifmaddr, "void *", "int", "int");
 DT_STATIC_PROBE3(refcnt, inpcb, "void *", "int", "int");
 DT_STATIC_PROBE3(refcnt, rtentry, "void *", "int", "int");
+DT_STATIC_PROBE3(refcnt, socket, "void *", "int", "int");
 DT_STATIC_PROBE3(refcnt, syncache, "void *", "int", "int");
 DT_STATIC_PROBE3(refcnt, tdb, "void *", "int", "int");
 
@@ -153,6 +154,7 @@ struct dt_probe *const dtps_static[] = {
        &_DT_STATIC_P(refcnt, ifmaddr),
        &_DT_STATIC_P(refcnt, inpcb),
        &_DT_STATIC_P(refcnt, rtentry),
+       &_DT_STATIC_P(refcnt, socket),
        &_DT_STATIC_P(refcnt, syncache),
        &_DT_STATIC_P(refcnt, tdb),
 };
Index: sys/kern/uipc_socket.c
===================================================================
RCS file: /cvs/src/sys/kern/uipc_socket.c,v
diff -u -p -r1.356 uipc_socket.c
--- sys/kern/uipc_socket.c      4 Jan 2025 15:57:02 -0000       1.356
+++ sys/kern/uipc_socket.c      7 Jan 2025 16:54:29 -0000
@@ -154,7 +154,7 @@ soalloc(const struct protosw *prp, int w
        }
 #endif
 
-       refcnt_init(&so->so_refcnt);
+       refcnt_init_trace(&so->so_refcnt, DT_REFCNT_IDX_SOCKET);
        rw_init_flags(&so->so_lock, dom_name, RWL_DUPOK);
        rw_init(&so->so_rcv.sb_lock, "sbufrcv");
        rw_init(&so->so_snd.sb_lock, "sbufsnd");
Index: sys/sys/refcnt.h
===================================================================
RCS file: /cvs/src/sys/sys/refcnt.h,v
diff -u -p -r1.12 refcnt.h
--- sys/sys/refcnt.h    28 Aug 2023 14:50:02 -0000      1.12
+++ sys/sys/refcnt.h    7 Jan 2025 16:54:29 -0000
@@ -49,8 +49,9 @@ unsigned int  refcnt_read(struct refcnt *
 #define DT_REFCNT_IDX_IFMADDR  3
 #define DT_REFCNT_IDX_INPCB    4
 #define DT_REFCNT_IDX_RTENTRY  5
-#define DT_REFCNT_IDX_SYNCACHE 6
-#define DT_REFCNT_IDX_TDB      7
+#define DT_REFCNT_IDX_SOCKET   6
+#define DT_REFCNT_IDX_SYNCACHE 7
+#define DT_REFCNT_IDX_TDB      8
 
 #endif /* _KERNEL */
 

Reply via email to