On Mon, Oct 07, 2024 at 02:04:04PM +0200, Matthieu Herrb wrote:
> On Mon, Oct 07, 2024 at 12:55:08PM +0100, Stuart Henderson wrote:
> > I have a bunch of machines running mostly X, mate desktop environment,
> > chromium.
> > 
> > Occasionally there are panics while the machine is idle - I left
> > instructions and someone on-site managed to get some photos this time.
> > 
> > uvm_fault(0xfffffd83fcb18838, 0x660, 0, 1) -> e
> > drm:pid95860:gen9_set_dc_state *ERROR* [drm] *ERROR* DC state mismatch (0x0 
> > -> 0x2)
> > kernel: page fault trap, code=0
> > Stopped at bread_0x33: testq $0x180,0x60(%rax)
> > TID PID UID PRFLAGS PFLAGS CPU COMMAND
> > *189589 95860 35 0x18000012 0 0K Xorg
> > 184707 83687 0 0x14000 0x200 3 i915_modeset
> > 42913 60436 0 0x14000 0x200 2 i915-unordered
> > 472773 65237 0 0x14000 0x200 1 drmubwq
> > bread(...
> > ffs2_balloc(...
> > ffs_write(...
> > VOP_WRITE(...
> > vn_write(...
> > dofilewritev(...
> > sys_write(...
> > syscall(...
> > Xsyscall at Xsyscall+0x128
> > end of kernel
> > end trace frame: 0x7c83cbbb9040, count: 6
> > 
> > I will be updating them to release this week, this one was running
> 
> The only file that Xorg writes to is its log file. Sometimes, when
> input devices go bad, it can be very verbose.
> 
> Can you check if any of the Xorg.0.log files are huge ?
> 
> But anyways I think that X is innocent (except perhaps for filling the
> file system, but that shouldn't result in a panic).
> 
> Imho It's either a pre-existing file system corruption or some other
> weird uvm bug.

What is strange is that 3 of the 4 CPUs are stuck in taskq_thread ->
taskq_next_work -> msleep.
On the other hand this looks like another vnode gone bad story.
Would be nice to know what bread+0x33 corresponds to.
According to my dump (which matches with the ddb output):
00000000000016d0 <bread>:
; {
...
    16eb: 49 c7 c0 ff ff ff ff          movq    $-0x1, %r8
    16f2: e8 00 00 00 00                callq   0x16f7 <bread+0x27>
    16f7: 48 c7 44 24 f8 00 00 00 00    movq    $0x0, -0x8(%rsp)
;       bp = getblk(vp, blkno, size, 0, INFSLP);
    1700: 49 89 c4                      movq    %rax, %r12
;       if (!ISSET(bp->b_flags, (B_DONE | B_DELWRI))) {
    1703: 48 f7 40 60 80 01 00 00       testq   $0x180, 0x60(%rax)      # imm = 
0x180

Here it explodes dereferencing bp->b_flags. Btw. it seems %rax is 0x600
since the fault address is 0x660. So getblk() returned a bad buf ...

-- 
:wq Claudio

Reply via email to