On Mon, Jun 30, 2025 at 2:39 AM K R <[email protected]> wrote:
>
> On Fri, Jun 27, 2025 at 3:34 AM Martin Pieuchot <[email protected]> wrote:
> >
> > On 26/06/25(Thu) 11:02, K R wrote:
> > > On Wed, Jun 25, 2025 at 1:30 PM K R <[email protected]> wrote:
> > > >
> > > > [...]
> > > > > Hi Alexander,
> > > > >
> > > > > The good news: I can consistently reproduce the hang problem.  But the
> > > > > bad news is that even with a WITNESS kernel and kern.witness.watch=2
> > > > > (or even 3) I don't see any message or kernel panic.
> >
> > Do you mind sharing your recipe to reproduce the hang?

Another one.  This time I've included output from 'show witness /b', below:

pmap_tlb_shootwait: spun outStopped at  db_enter+0x14:  popq    %rbp
ddb{3}> show locks
exclusive rwlock amaplk r = 0 (0xfffffd9619a06de0)
shared rwlock vmmaplk r = 0 (0xfffffd9ca1d22e38)
ddb{3}> show witness /b
Number of known direct relationships is 400

Lock order reversal between "&mp->mnt_lock"(rwlock) and "&ip->i_lock"(rrwlock)!

witness: incomplete path, depth 4
ddb{3}> trace
db_enter() at db_enter+0x14
pmap_enter(fffffda07ffdfc00,9f5a20cd000,160950d000,3,22) at pmap_enter+0x725
uvm_fault_lower(ffff800055e422b8,ffff800055e422f0,ffff800055e42230) at uvm_faul
t_lower+0x27f
uvm_fault(fffffd9ca1d22d38,9f5a20cd000,0,2) at uvm_fault+0x1c5
upageflttrap(ffff800055e42420,9f5a20cd000) at upageflttrap+0x6c
usertrap(ffff800055e42420) at usertrap+0x1f5
recall_trap() at recall_trap+0x8
end of kernel
end trace frame: 0x9f54e22cfd0, count: -7
ddb{3}> show uvm
Current UVM status:
  pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
  32128636 VM pages: 44272 active, 176519 inactive, 1 wired, 24475351 free (341
0002 zero)
  freemin=1070954, free-target=1427938, inactive-target=0, wired-max=10709545
  faults=214303293, traps=213079270, intrs=520397, ctxswitch=19116075 fpuswitch
=0
  softint=775344, syscalls=9310911, kmapent=11
  fault counts:
    noram=0, noanon=0, noamap=0, pgwait=0, pgrele=0
    relocks=288640(4474), upgrades=0(0) anget(retries)=187343(0), amapcopy=1473
6962
    neighbor anon/obj pg=35402/115880, gets(lock/unlock)=347448/293136
    cases: anon=172569, anoncow=14774, obj=333552, prcopy=9400, przero=21377198
5
  daemon and swap counts:
    woke=0, revs=0, scans=0, obscans=0, anscans=0
    busy=0, freed=0, reactivate=0, deactivate=0
    pageouts=0, pending=0, nswget=0
    nswapdev=1
    swpages=16777883, swpginuse=0, swpgonly=0 paging=0
  kernel pointers:
    objs(kern)=0xffffffff82a8e468
ddb{3}> show bcstats
Current Buffer Cache status:
numbufs 502298 busymapped 0, delwri 4
kvaslots 5221 avail kva slots 5220
bufpages 7357580, dmapages 83476, dirtypages 64
pendingreads 0, pendingwrites 0
highflips 504530, highflops 0, dmaflips 9697
ddb{3}> ps /o
    TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
 212315  82617      0         0x3          0    9  nfdump
 232570  82617      0         0x3  0x4000000    4  nfdump
*266568  55967      0         0x3  0x4000000    3  nfdump
 117929  55967      0         0x3  0x4000000    0K nfdump
 151763  18161      0         0x3          0   10  nfdump
  88291  18161      0         0x3  0x4000000    2  nfdump



>
> Now a protection fault trap, below.  This is a -current kernel
> compiled with WITNESS and MP_LOCKDEBUG.  Please let me know if you
> need to disable these options or enable something else.
>
> Thanks,
> --Kor
>
> kernel: protection fault trap, code=0
> Stopped at      Xipi_invlrange_pcid+0x37:
> show panic
> the kernel did not panic
> ddb{11}> trace
> Xipi_invlrange_pcid() at Xipi_invlrange_pcid+0x37
> Xsyscall() at Xsyscall+0x128
> end of kernel
> end trace frame: 0x77e7d698c600, count: -2
> ddb{11}> show locks
> ddb{11}> show uvm
> Current UVM status:
>   pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
>   32128632 VM pages: 48715 active, 194971 inactive, 1 wired, 12073171 free 
> (179
> 3670 zero)
>   freemin=1070954, free-target=1427938, inactive-target=5649476, 
> wired-max=1070
> 9544
>   faults=-61402478, traps=1000373307, intrs=109644593, ctxswitch=-710638854 
> fpu
> switch=0
>   softint=1129844981, syscalls=-541527240, kmapent=11
>   fault counts:
>     noram=0, noanon=0, noamap=0, pgwait=0, pgrele=0
>     relocks=313014(4555), upgrades=0(0) anget(retries)=19314639(0), 
> amapcopy=11
> 42296398
>     neighbor anon/obj pg=487865/1083289, gets(lock/unlock)=672813/317589
>     cases: anon=19173423, anoncow=141216, obj=596517, prcopy=71721, 
> przero=-813
> 86369
>   daemon and swap counts:
>     woke=1576, revs=0, scans=0, obscans=0, anscans=0
>     busy=0, freed=0, reactivate=0, deactivate=0
>     pageouts=0, pending=0, nswget=0
>     nswapdev=1
>     swpages=16777883, swpginuse=0, swpgonly=0 paging=0
>   kernel pointers:
>     objs(kern)=0xffffffff82b406d8
> ddb{11}> show bcstats
> Current Buffer Cache status:
> numbufs 1328555 busymapped 1, delwri 4
> kvaslots 5221 avail kva slots 5220
> bufpages 19307018, dmapages 83470, dirtypages 52
> pendingreads 1, pendingwrites 0
> highflips 105286762, highflops 0, dmaflips 27731
> ddb{11}> ps /o
>     TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
>   91867  43631      0         0x3  0x4000000    2  semaphore
>  521718  23585      0         0x3  0x4000000   12  semaphore
>  101939  23757      0         0x3  0x4000000    1  semaphore
>  207251  37453      0         0x3  0x4000000   10  semaphore
>  100237  19899      0         0x3  0x4000000    7  semaphore
>  397618  64074      0         0x3  0x4000000    8  semaphore
>  147922  39167      0         0x3  0x4000000    4  semaphore
>   41621  25216      0         0x3  0x4000000   14  semaphore
> *111074  74925      0         0x3          0   11  nfdump
>   63333  74925      0         0x3  0x4000000    0  nfdump
>  413988  10639      0         0x3          0   15  nfdump
>  194585  10639      0         0x3  0x4000000    3  nfdump
>   79353  71900      0         0x3          0    6  nfdump
>  171795  71900      0         0x3  0x4000000   13K nfdump
>
> >
> > > > > Any additional suggestion in order to increase debug information or
> > > > > force the machine to go into ddb?
> > > >
> > > > I tried a MP_LOCKDEBUG kernel and finally got a panic, shown below.
> > > > Sorry for not including 'show locks' and all the cpus, but my serial
> > >
> > > Another hang, now with show locks:
> > >
> > > ddb{2}> show locks
> > > exclusive rwlock amaplk r = 0 (0xfffffd9a244e4e78)
> > > exclusive rwlock vmmaplk r = 0 (0xfffffd9c93c4ca90)
> > > --
> > > ddb{2}> ps /o
> > >     TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
> > >  322239  87810      0         0x3  0x4000000    0  semaphore
> > >   86705  31331      0         0x3  0x4000000   11  semaphore
> > >  383438  44550      0         0x3  0x4000000   14  semaphore
> > >  257640  77711      0         0x3  0x4000000   13  semaphore
> > >  450354  41159      0         0x3  0x4000000    7  semaphore
> > >  427275  25212      0         0x3  0x4000000   12  semaphore
> > >  227039  47648      0         0x3  0x4000000    8  semaphore
> > >   28812  15439      0         0x3  0x4000000    6  semaphore
> > >   37644  76378      0    0x100003          0    3K top
> > > *305600   7089      0         0x3          0    2  nfdump
> > >  192362   7089      0         0x3  0x4000000    9  nfdump
> > >  160550  56545      0         0x3  0x4000000    4  nfdump
> > >  367292  98858      0         0x3  0x4000000    1  nfdump
> > > --
> > > ddb{2}> trace
> > > db_enter() at db_enter+0x14
> > > pmap_do_remove(fffffd9c7aa25100,d6734d0000,d6734e0000,0) at 
> > > pmap_do_remove+0x4a9
> > > uvm_unmap_kill_entry_withlock(fffffd9c93c4c990,fffffd9c80c83258,1) at
> > > uvm_unmap_kill_entry_withlock+0x133
> > > uvm_unmap_remove(fffffd9c93c4c990,d6734a0000,d6739a0000,ffff800055feaed0,0,1,7ff574ce4a271e31)
> > > at uvm_unmap_remove+0x32f
> > > sys_munmap(ffff8000fffe6070,ffff800055feafd0,ffff800055feaf50) at
> > > sys_munmap+0x10b
> > > syscall(ffff800055feafd0) at syscall+0x5f9
> > > Xsyscall() at Xsyscall+0x128
> > > end of kernel
> > > end trace frame: 0x7365c97d4f10, count: -7
> > > --
> > > ddb{2}> machine ddbcpu 0
> > > Stopped at      x86_ipi_db+0x16:        leave
> > >
> > > ddb{0}> trace
> > > x86_ipi_db(ffffffff829e6ff0) at x86_ipi_db+0x16
> > > x86_ipi_handler() at x86_ipi_handler+0x80
> > > Xresume_lapic_ipi() at Xresume_lapic_ipi+0x27
> > > __mp_lock(ffffffff82b2d348) at __mp_lock+0xbe
> > > softintr_dispatch(0) at softintr_dispatch+0xeb
> > > dosoftint(0) at dosoftint+0x3f
> > > Xsoftclock() at Xsoftclock+0x27
> > > end of kernel
> > > end trace frame: 0x71789b60c60, count: -7
> > > --
> > > ddb{1}> machine ddbcpu 2
> > > Stopped at      db_enter+0x14:  popq    %rbp
> > > ddb{2}> trace
> > > db_enter() at db_enter+0x14
> > > pmap_do_remove(fffffd9c7aa25100,d6734d0000,d6734e0000,0) at 
> > > pmap_do_remove+0x4a9
> > > uvm_unmap_kill_entry_withlock(fffffd9c93c4c990,fffffd9c80c83258,1) at
> > > uvm_unmap_kill_entry_withlock+0x133
> > > uvm_unmap_remove(fffffd9c93c4c990,d6734a0000,d6739a0000,ffff800055feaed0,0,1,7ff574ce4a271e31)
> > > at uvm_unmap_remove+0x32f
> > > sys_munmap(ffff8000fffe6070,ffff800055feafd0,ffff800055feaf50) at
> > > sys_munmap+0x10b
> > > syscall(ffff800055feafd0) at syscall+0x5f9
> > > Xsyscall() at Xsyscall+0x128
> > > end of kernel
> > > end trace frame: 0x7365c97d4f10, count: -7
> > > --
> > >
> > > > console session froze before that.
> > > >
> > > > Hope it helps,
> > > > --Kor
> > > >
> > > > pmap_tlb_shootwait: spun out
> > > > Stopped at  db_enter+0x14:  popq    %rbp
> > > > ddb{13}>
> > > > --
> > > > ddb{13}> show panic
> > > > the kernel did not panic
> > > > --
> > > > ddb{13}> ps /o
> > > >     TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
> > > >  307801  52176      0         0x2  0x4000000   10  semaphore
> > > >  410431  54690      0         0x2  0x4000000   15  semaphore
> > > >  257513  26918      0         0x2  0x4000000   11  semaphore
> > > >  138054  68532      0         0x2  0x4000000    5  semaphore
> > > >  203059   2408      0         0x2  0x4000000    9  semaphore
> > > >   89141   5998      0         0x2  0x4000000    8  semaphore
> > > >  263439  99498      0         0x2  0x4000000    7  semaphore
> > > >  477129  15941      0         0x2  0x4000000   12  semaphore
> > > >   41845  21800      0         0x3  0x4000000    2  nfdump
> > > >  371022  21800      0         0x3  0x4000000    0  nfdump
> > > > *391905  66338      0         0x3  0x4000000   13  nfdump
> > > >  367615  66338      0         0x3  0x4000000    6  nfdump
> > > >  394180  56727      0         0x3  0x4000000    3  nfdump
> > > >   46184  33300      0    0x100003          0    1K top
> > > > --
> > > > ddb{13}> trace
> > > > db_enter() at db_enter+0x14
> > > > pmap_page_remove(fffffd8184bdc408) at pmap_page_remove+0x38e
> > > > uvm_anfree_list(fffffd9c88e2be50,0) at uvm_anfree_list+0x63
> > > > amap_wipeout(fffffd9ca390fc88) at amap_wipeout+0x198
> > > > uvm_unmap_detach(ffff80005608fcc0,0) at uvm_unmap_detach+0xbe
> > > > sys_munmap(ffff800055cda540,ffff80005608fdc0,ffff80005608fd40) at
> > > > sys_munmap+0x185
> > > > syscall(ffff80005608fdc0) at syscall+0x5f9
> > > > Xsyscall() at Xsyscall+0x128
> > > > end of kernel
> > > > end trace frame: 0x123cf0e2b80, count: -8
> > > > --
> > > > ddb{13}> machine ddbcpu 1
> > > > Stopped at      x86_ipi_db+0x16:        leave
> > > >
> > > > ddb{1}> trace
> > > > x86_ipi_db(ffff8000552aaff0) at x86_ipi_db+0x16
> > > > x86_ipi_handler() at x86_ipi_handler+0x80
> > > > Xresume_lapic_ipi() at Xresume_lapic_ipi+0x27
> > > > pmap_enter(fffffd9cd8988d00,7103880b6000,1c7767a000,3,33) at 
> > > > pmap_enter+0x703
> > > > uvm_fault_upper(ffff800055da4928,ffff800055da4960,ffff800055da4820) at
> > > > uvm_fault_upper+0x1e0
> > > > uvm_fault(fffffd9ca1ab8b58,7103880b6000,2,3) at uvm_fault+0xce
> > > > uvm_fault_wire(fffffd9ca1ab8b58,7103880b6000,7103880b7000,3) at
> > > > uvm_fault_wire+0x58
> > > > uvm_map_pageable_wire(fffffd9ca1ab8b58,fffffd9c8f522700,fffffd9ca0915ab8,0,3,0)
> > > >  at uvm_map_pageable_wire+0x19d
> > > > sysctl_vslock(7103880b6d68,18) at sysctl_vslock+0x9b
> > > > sys_sysctl(ffff800055d61c60,ffff800055da4c80,ffff800055da4c00) at
> > > > sys_sysctl+0x187
> > > > syscall(ffff800055da4c80) at syscall+0x5f9
> > > > Xsyscall() at Xsyscall+0x128
> > > > end of kernel
> > > > end trace frame: 0x7103880b6dc0, count: -12
> > > > --
> > > >
> > > > [...]
> > >
> >
> >

Reply via email to