On Mon, Oct 07, 2024 at 02:04:04PM +0200, Matthieu Herrb wrote: > On Mon, Oct 07, 2024 at 12:55:08PM +0100, Stuart Henderson wrote: > > I have a bunch of machines running mostly X, mate desktop environment, > > chromium. > > > > Occasionally there are panics while the machine is idle - I left > > instructions and someone on-site managed to get some photos this time. > > > > uvm_fault(0xfffffd83fcb18838, 0x660, 0, 1) -> e > > drm:pid95860:gen9_set_dc_state *ERROR* [drm] *ERROR* DC state mismatch (0x0 > > -> 0x2) > > kernel: page fault trap, code=0 > > Stopped at bread_0x33: testq $0x180,0x60(%rax) > > TID PID UID PRFLAGS PFLAGS CPU COMMAND > > *189589 95860 35 0x18000012 0 0K Xorg > > 184707 83687 0 0x14000 0x200 3 i915_modeset > > 42913 60436 0 0x14000 0x200 2 i915-unordered > > 472773 65237 0 0x14000 0x200 1 drmubwq > > bread(... > > ffs2_balloc(... > > ffs_write(... > > VOP_WRITE(... > > vn_write(... > > dofilewritev(... > > sys_write(... > > syscall(... > > Xsyscall at Xsyscall+0x128 > > end of kernel > > end trace frame: 0x7c83cbbb9040, count: 6 > > > > I will be updating them to release this week, this one was running > > The only file that Xorg writes to is its log file. Sometimes, when > input devices go bad, it can be very verbose. > > Can you check if any of the Xorg.0.log files are huge ? > > But anyways I think that X is innocent (except perhaps for filling the > file system, but that shouldn't result in a panic). > > Imho It's either a pre-existing file system corruption or some other > weird uvm bug.
What is strange is that 3 of the 4 CPUs are stuck in taskq_thread -> taskq_next_work -> msleep. On the other hand this looks like another vnode gone bad story. Would be nice to know what bread+0x33 corresponds to. According to my dump (which matches with the ddb output): 00000000000016d0 <bread>: ; { ... 16eb: 49 c7 c0 ff ff ff ff movq $-0x1, %r8 16f2: e8 00 00 00 00 callq 0x16f7 <bread+0x27> 16f7: 48 c7 44 24 f8 00 00 00 00 movq $0x0, -0x8(%rsp) ; bp = getblk(vp, blkno, size, 0, INFSLP); 1700: 49 89 c4 movq %rax, %r12 ; if (!ISSET(bp->b_flags, (B_DONE | B_DELWRI))) { 1703: 48 f7 40 60 80 01 00 00 testq $0x180, 0x60(%rax) # imm = 0x180 Here it explodes dereferencing bp->b_flags. Btw. it seems %rax is 0x600 since the fault address is 0x660. So getblk() returned a bad buf ... -- :wq Claudio