On 22/04/24(Mon) 16:18, Mark Kettenis wrote: > > Date: Mon, 22 Apr 2024 15:39:55 +0200 > > From: Alexander Bluhm <bl...@openbsd.org> > > > > Hi, > > > > I see a witness lock order reversal warning with soreceive. It > > happens during NFS regress tests. In /var/log/messages is more > > context from regress. > > > > Apr 22 03:18:08 ot29 /bsd: uid 0 on > > /mnt/regress-ffs/fstest_49fd035b8230791792326afb0604868b: out of inodes > > Apr 22 03:18:21 ot29 mountd[6781]: Bad exports list line > > /mnt/regress-nfs-server > > Apr 22 03:19:08 ot29 /bsd: witness: lock order reversal: > > Apr 22 03:19:08 ot29 /bsd: 1st 0xfffffd85c8ae12a8 vmmaplk (&map->lock) > > Apr 22 03:19:08 ot29 /bsd: 2nd 0xffff80004c488c78 nfsnode (&np->n_lock) > > Apr 22 03:19:08 ot29 /bsd: lock order data w2 -> w1 missing > > Apr 22 03:19:08 ot29 /bsd: lock order "&map->lock"(rwlock) -> > > "&np->n_lock"(rrwlock) first seen at: > > Apr 22 03:19:08 ot29 /bsd: #0 rw_enter+0x6d > > Apr 22 03:19:08 ot29 /bsd: #1 rrw_enter+0x5e > > Apr 22 03:19:08 ot29 /bsd: #2 VOP_LOCK+0x5f > > Apr 22 03:19:08 ot29 /bsd: #3 vn_lock+0xbc > > Apr 22 03:19:08 ot29 /bsd: #4 vn_rdwr+0x83 > > Apr 22 03:19:08 ot29 /bsd: #5 vndstrategy+0x2ca > > Apr 22 03:19:08 ot29 /bsd: #6 physio+0x204 > > Apr 22 03:19:08 ot29 /bsd: #7 spec_write+0x9e > > Apr 22 03:19:08 ot29 /bsd: #8 VOP_WRITE+0x45 > > Apr 22 03:19:08 ot29 /bsd: #9 vn_write+0x100 > > Apr 22 03:19:08 ot29 /bsd: #10 dofilewritev+0x14e > > Apr 22 03:19:08 ot29 /bsd: #11 sys_pwrite+0x60 > > Apr 22 03:19:08 ot29 /bsd: #12 syscall+0x588 > > Apr 22 03:19:08 ot29 /bsd: #13 Xsyscall+0x128 > > You're not talking about this one isn't it?
This also seems to be in the correct order. vmmaplk before FS lock. That's the order of physio(9) and uvm_fault(). > > Apr 22 03:19:08 ot29 /bsd: witness: lock order reversal: > > Apr 22 03:19:08 ot29 /bsd: 1st 0xfffffd85c8ae12a8 vmmaplk (&map->lock) > > Apr 22 03:19:08 ot29 /bsd: 2nd 0xffff80002ec41860 sbufrcv > > (&so->so_rcv.sb_lock) > > Apr 22 03:19:08 ot29 /bsd: lock order "&so->so_rcv.sb_lock"(rwlock) -> > > "&map->lock"(rwlock) first seen at: > > Apr 22 03:19:08 ot29 /bsd: #0 rw_enter_read+0x50 > > Apr 22 03:19:08 ot29 /bsd: #1 uvmfault_lookup+0x8a > > Apr 22 03:19:08 ot29 /bsd: #2 uvm_fault_check+0x36 > > Apr 22 03:19:08 ot29 /bsd: #3 uvm_fault+0xfb > > Apr 22 03:19:08 ot29 /bsd: #4 kpageflttrap+0x158 > > Apr 22 03:19:08 ot29 /bsd: #5 kerntrap+0x94 > > Apr 22 03:19:08 ot29 /bsd: #6 alltraps_kern_meltdown+0x7b > > Apr 22 03:19:08 ot29 /bsd: #7 copyout+0x57 > > Apr 22 03:19:08 ot29 /bsd: #8 soreceive+0x99a > > Apr 22 03:19:08 ot29 /bsd: #9 recvit+0x1fd > > Apr 22 03:19:08 ot29 /bsd: #10 sys_recvfrom+0xa4 > > Apr 22 03:19:08 ot29 /bsd: #11 syscall+0x588 > > Apr 22 03:19:08 ot29 /bsd: #12 Xsyscall+0x128 > > Apr 22 03:19:08 ot29 /bsd: lock order data w1 -> w2 missing > > Unfortunately we don't see the backtrace for the reverse lock order. > So it is hard to say something sensible. Without more information I'd > say that taking "&so->so_rcv.sb_lock" before "&map->lock" is the > correct lock order. I agree. Now I'd be very grateful if someone could dig into WITNESS to figure out why we see such reports. Are these false positive or are we missing data from the code path that we think are incorrect?