> On 7 Jan 2025, at 17:25, Alexander Bluhm <bl...@openbsd.org> wrote:
> 
> Hi,
> 
> My daily netlink test found a crash during socket splicing.
> 
> [-- MARK -- Tue Jan  7 08:05:00 2025]
> uvm_fault(0xffffffff828c74e8, 0x7, 0, 2) -> e
> kernel: page fault trap, code=2
> Stopped at      taskq_next_work+0x8e:   movq    %rdx,0x8(%rsi)
>    TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
> *213124  16048      0     0x14000      0x200    3  sosplice
> 204927  99709      0     0x14000      0x200    0  softnet0
> taskq_next_work(ffff800000078000,ffff8000359fc4c0) at taskq_next_work+0x8e
> taskq_thread(ffff800000078000) at taskq_thread+0x10b
> end trace frame: 0x0, count: 13
> https://www.openbsd.org/ddb.html describes the minimum info required in bug
> reports.  Insufficient info makes it difficult to find and fix bugs.
> ddb{3}> [-- MARK -- Tue Jan  7 08:10:00 2025]
> 
> I have seen it once on real hardware andd once as KVM guest.  It
> does not happen at the first test run, but after 4 to 8 runs it may
> crash.  Affected versions are
> 
> OpenBSD 7.6-current (GENERIC.MP) #498: Mon Jan  6 12:16:01 MST 2025
>    dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
> OpenBSD 7.6-current (GENERIC.MP) #cvs : D2025.01.07.00.00.00: Tue Jan  7 
> 07:49:46 CET 2025
>    r...@ot48.obsd-lab.genua.de:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 

Smells like the socket reference counting problem. I made two socket related
diff last days. The first one which introduced new sorele() and converted
sosplice() task with timeout re-initialzation was committed 2024/12/30. The
second one which switched sosplice() to shared locks was committed at
2025/01/04.

What was the last stable build? Had you try to run sosplice test with my
last diff reverted?

Reply via email to