> On 7 Jan 2025, at 17:25, Alexander Bluhm <bl...@openbsd.org> wrote: > > Hi, > > My daily netlink test found a crash during socket splicing. > > [-- MARK -- Tue Jan 7 08:05:00 2025] > uvm_fault(0xffffffff828c74e8, 0x7, 0, 2) -> e > kernel: page fault trap, code=2 > Stopped at taskq_next_work+0x8e: movq %rdx,0x8(%rsi) > TID PID UID PRFLAGS PFLAGS CPU COMMAND > *213124 16048 0 0x14000 0x200 3 sosplice > 204927 99709 0 0x14000 0x200 0 softnet0 > taskq_next_work(ffff800000078000,ffff8000359fc4c0) at taskq_next_work+0x8e > taskq_thread(ffff800000078000) at taskq_thread+0x10b > end trace frame: 0x0, count: 13 > https://www.openbsd.org/ddb.html describes the minimum info required in bug > reports. Insufficient info makes it difficult to find and fix bugs. > ddb{3}> [-- MARK -- Tue Jan 7 08:10:00 2025] > > I have seen it once on real hardware andd once as KVM guest. It > does not happen at the first test run, but after 4 to 8 runs it may > crash. Affected versions are > > OpenBSD 7.6-current (GENERIC.MP) #498: Mon Jan 6 12:16:01 MST 2025 > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > OpenBSD 7.6-current (GENERIC.MP) #cvs : D2025.01.07.00.00.00: Tue Jan 7 > 07:49:46 CET 2025 > r...@ot48.obsd-lab.genua.de:/usr/src/sys/arch/amd64/compile/GENERIC.MP >
Smells like the socket reference counting problem. I made two socket related diff last days. The first one which introduced new sorele() and converted sosplice() task with timeout re-initialzation was committed 2024/12/30. The second one which switched sosplice() to shared locks was committed at 2025/01/04. What was the last stable build? Had you try to run sosplice test with my last diff reverted?