On Sat, 2 Nov 2024 05:23:17 -0700
Eric Grosse <gro...@gmail.com> wrote:

> Since David Gwynne's locking changes in uvm on October 30, I have seen
> no more kernel panics on this workload (compiling /usr/src -j24 on
> 8core Power9) that consistently crashed before. Thank you, sir!

The tip, "compiling /usr/src -j24", was what I needed to reproduce the
problem on my powerpc64.  My kernel crashed into ddb at
subr_pool.c:862, because a page in pmap_pted_pool wasn't mapped.  My
guess is that these crashes and panics only happen on powerpc64, and
always in pmap_pted_pool or pmap_vp_pool.

My kernel has dlg's change (uvm_map r1.333, try to simplify the
locking code around busy maps.), so the change didn't fix it.

My kernel has 2 diffs,
 - https://marc.info/?l=openbsd-bugs&m=171721578026377&w=2
   changing memory barriers from "isync" to "sync", not committed.
 - https://marc.info/?l=openbsd-tech&m=173053739517589&w=2
   before it became uvm_pdaemon.c r1.72.

My kernel does not have option WITNESS.  If it did, uvm_fault might
have panicked in a WITNESS check.

This is a 4-core POWER9 with 8G ram and 8448M swap.  I can't do
"make -j16 build" in /usr/src, because it runs out of ram and swap
during llvm-tblgen.  Instead, I do my usual "make -j4 build" in
/usr/src, then "make clean" and "make -j16" in libLLVM.

On 2024-11-05, "make -j16" in /usr/src/gnu/usr.bin/clang/libLLVM
quickly caused a c++ crash compiling AMDGPUArgumentUsageInfo.cpp,

c++: error: clang frontend command failed with exit code 139 ...

then entered ddb at a trap 0x300 (data storage trap),

dar 0xc000000086106f90 dsisr 0x42000000
trap type 300 srr1 9000000000009032 at 5fd5d0 lr 5fd584
Stopped at      pool_do_put+0x194:      std r3,0(r4)
ddb{2}> show panic
the kernel did not panic

This dar (data address) is inside kernel virtual address space,
                            segment page
  VM_MIN_KERNEL_ADDRESS = c00000000_0000_000
                    dar = c00000008_6106_f90
  VM_MAX_KERNEL_ADDRESS = c0000007f_ffff_fff

The dsisr (data storage interrupt status register) has 2 bits,
  4000_0000 = bit 33: not found in page table
  0200_0000 = bit 38: store
implying that the kernel is writing to unmapped memory.

ddb{2}> trace
pool_do_put+0x194
pool_put+0x94
pmap_remove_pted+0x29c
pmap_remove+0x130
uvm_unmap_kill_entry_withlock+0x1c0
uvm_unmap_remove+0x3cc
uvm_unmap+0x88
coredump_unmap+0x30
coredump_elf+0x15c
coredump+0x3a8
sigexit+0x88
postsig+0x304
userret+0xd8
trap+0x904
trapagain+0x4
--- trap (type 0x400) ---
End of kernel: 0xbffff4a8dee71e60 lr 0x47204c6c4
ddb{2}> mach cpuinfo
    0: stopped
    1: stopped
*   2: ddb
    3: stopped

cpu2 got trap 0x300 while dumping core for a user process that failed
an instruction fetch (trap 0x400, probably SIGSEGV).

ddb{3}> trace
cpu_intr+0x50
xive_hvi+0x1b8
hvi_intr+0x38
trap+0xd4
trapagain+0x4
--- trap (type 0xea0) ---
mtx_enter+0x5c
pool_lock_mtx_enter+0x28
pool_get+0xa8
pmap_enter+0x1ac
uvm_fault_lower+0x8dc
uvm_fault+0x200
trap+0x4a8
trapagain+0x4
--- trap (type 0x300) ---
End of kernel: 0xbffff3c8eccd50f0 lr 0x4d86000fc
ddb{3}> mach ddbcpu 0
Stopped at      cpu_intr+0x50:  ori r0,r0,0x0
ddb{0}> trace
cpu_intr+0x50
xive_hvi+0x1b8
hvi_intr+0x38
trap+0xd4
trapagain+0x4
--- trap (type 0xea0) ---
_kernel_lock+0xe4
xive_hvi+0x1a0
hvi_intr+0x38
trap+0xd4
trapagain+0x4
--- trap (type 0xea0) ---
_kernel_lock+0xe0
xive_hvi+0x1a0
hvi_intr+0x38
trap+0xd4
trapagain+0x4
--- trap (type 0xea0) ---
End of kernel: 0xbffff1c6a40f6180 lr 0x135aaf60
ddb{0}> mach ddbcpu 1
Stopped at      cpu_intr+0x50:  ori r0,r0,0x0
ddb{1}> trace
cpu_intr+0x50
xive_hvi+0x1b8
hvi_intr+0x38
trap+0xd4
trapagain+0x4
--- trap (type 0xea0) ---
mtx_enter+0x5c
pool_lock_mtx_enter+0x28
pool_get+0xa8
pmap_enter+0x1ac
uvm_fault_lower+0x8dc
uvm_fault+0x200
trap+0x4a8
trapagain+0x4
--- trap (type 0x300) ---
End of kernel: 0xbffff56d6a95e3f0 lr 0x1540a180

The other CPUs were waiting for locks, while handling an interrupt on
cpu0, and trap 0x300 from user processes on cpu1 and cpu3.

ddb{1}> ps
   PID     TID   PPID    UID  S       FLAGS  WAIT          COMMAND
 84801   40873  67936   1000  7         0x3                c++
 33199  477867   7091   1000  2         0x3                c++
 96353  101250    531   1000  2         0x3                c++
 21344   18327  21102   1000  7         0x3                c++
 88128  182318  61620   1000  2         0x3                c++
 37806  184701  93992   1000  7      0x2803                c++
 78163  452720  97105   1000  2         0x3                c++
 58191  481502  78370   1000  2         0x3                c++
 96769   63855  59555   1000  2         0x3                c++
   531  224052  59207   1000  3    0x10008b  sigsusp       sh
 61620  285046  59207   1000  3    0x10008b  sigsusp       sh
 74865   41174  76091   1000  2         0x3                c++
 76803  128670  71611   1000  2         0x3                c++
 47620  110479   2022   1000  2         0x3                c++
 59555  262350  59207   1000  3    0x10008b  sigsusp       sh
 78370  393482  59207   1000  3    0x10008b  sigsusp       sh
 38651  409124  97630   1000  2         0x3                c++
  7091  203197  59207   1000  3    0x10008b  sigsusp       sh
 21102  213779  59207   1000  3    0x10008b  sigsusp       sh
 80444     184  69476   1000  2         0x3                c++
 97105  373651  59207   1000  3    0x10008b  sigsusp       sh
  2022  250517  59207   1000  3    0x10008b  sigsusp       sh
 76091  276059  59207   1000  3    0x10008b  sigsusp       sh
 97630  523875  59207   1000  3    0x10008b  sigsusp       sh
 71611  272837  59207   1000  3    0x10008b  sigsusp       sh
 69476  252045  59207   1000  3    0x10008b  sigsusp       sh
 88929  482059  33978   1000  2         0x3                c++
 93992  239732  59207   1000  3    0x10008b  sigsusp       sh
*58636  514616   2598   1000  7         0x3                c++
 67936  407623  59207   1000  3    0x10008b  sigsusp       sh
 33978  464490  59207   1000  3    0x10008b  sigsusp       sh
  2598  316802  59207   1000  3    0x10008b  sigsusp       sh
 59207  131223  86919   1000  3    0x10008b  sigsusp       make
 82725  478883  58813   1000  3    0x100083  kqread        top
 58813   12839  62361   1000  3    0x10008b  sigsusp       ksh
 62361  369886  71833   1000  3        0x98  kqread        sshd-session
 71833  317743  90483      0  3        0x92  kqread        sshd-session
 86919  190875  36007   1000  3    0x10008b  sigsusp       ksh
 36007   24811      1   1000  3    0x100080  kqread        tmux
 67124  167227  32763   1000  3    0x100083  kqread        tmux
 32763   20652  24825   1000  3    0x10008b  sigsusp       ksh
 24825  321382  22401   1000  3        0x98  kqread        sshd-session
 22401  341909  90483      0  3        0x92  kqread        sshd-session
 33839  412563      1      0  3    0x100083  ttyin         getty
 51382  509310      1      0  3    0x100098  kqread        cron
 35442  220424      1      0  3        0x80  kqread        apmd
 47321  438992      1     99  3   0x1100090  kqread        sndiod
 50236  450838      1    110  3    0x100090  kqread        sndiod
 56266  110208  24101     95  3   0x1100092  kqread        smtpd
 93717  189563  24101    103  3   0x1100092  kqread        smtpd
 65589  477803  24101     95  3   0x1100092  kqread        smtpd
 13555  168486  24101     95  3    0x100092  kqread        smtpd
 55171  516901  24101     95  3   0x1100092  kqread        smtpd
 23668  196445  24101     95  3   0x1100092  kqread        smtpd
 24101  500988      1      0  3    0x100080  kqread        smtpd
 90483  376863      1      0  3        0x88  kqread        sshd
 17145  283577      1      0  3    0x100080  kqread        ntpd
 78329  107955  89943     83  3    0x100092  kqread        ntpd
 89943  509297      1     83  3   0x1100092  kqread        ntpd
 48165  154592  93642     74  3   0x1100092  bpf           pflogd
 93642  275216      1      0  3        0x80  sbwait        pflogd
 41570  449178  33311     73  3   0x1100090  kqread        syslogd
 33311   20425      1      0  3    0x100082  sbwait        syslogd
 63280  414530      1      0  3    0x100080  kqread        resolvd
  9034  320677  11250     77  3    0x100092  kqread        dhcpleased
 59329  174084  11250     77  3    0x100092  kqread        dhcpleased
 11250  325196      1      0  3        0x80  kqread        dhcpleased
 81172  433081  71724    115  3    0x100092  kqread        slaacd
 32577  203131  71724    115  3    0x100092  kqread        slaacd
 71724  271786      1      0  3    0x100080  kqread        slaacd
 90514  160684      0      0  3     0x14200  bored         smr
  9849  316752      0      0  2     0x14200                zerothread
 37944  159183      0      0  3     0x14200  aiodoned      aiodoned
 83959   87621      0      0  3     0x14200  syncer        update
 39840   93501      0      0  3     0x14200  cleaner       cleaner
 60281  230067      0      0  3     0x14200  reaper        reaper
 35703  239731      0      0  3     0x14200  pgdaemon      pagedaemon
 80142  370629      0      0  3     0x14200  bored         srdis
 42472  341809      0      0  3     0x14200  usbtsk        usbtask
 18228  293761      0      0  3     0x14200  usbatsk       usbatsk
 78224  352752      0      0  3     0x14200  bored         ipmicmd
 92823  380649      0      0  3     0x14200  ipmi_poll     ipmi0
 77480  294029      0      0  3     0x14200  bored         sensors
 96848  114487      0      0  3  0x40014200                idle3
 37819  295743      0      0  3  0x40014200                idle2
 42420  517275      0      0  3  0x40014200                idle1
 38497  366277      0      0  3     0x14200  bored         softnet3
   987  386467      0      0  3     0x14200  bored         softnet2
 41113  127693      0      0  3     0x14200  bored         softnet1
 82856  280125      0      0  3     0x14200  bored         softnet0
 60281  230067      0      0  3     0x14200  reaper        reaper
 35703  239731      0      0  3     0x14200  pgdaemon      pagedaemon
 80142  370629      0      0  3     0x14200  bored         srdis
 42472  341809      0      0  3     0x14200  usbtsk        usbtask
 18228  293761      0      0  3     0x14200  usbatsk       usbatsk
 78224  352752      0      0  3     0x14200  bored         ipmicmd
 92823  380649      0      0  3     0x14200  ipmi_poll     ipmi0
 77480  294029      0      0  3     0x14200  bored         sensors
 96848  114487      0      0  3  0x40014200                idle3
 37819  295743      0      0  3  0x40014200                idle2
 42420  517275      0      0  3  0x40014200                idle1
 38497  366277      0      0  3     0x14200  bored         softnet3
   987  386467      0      0  3     0x14200  bored         softnet2
 41113  127693      0      0  3     0x14200  bored         softnet1
 82856  280125      0      0  3     0x14200  bored         softnet0
 21154   82752      0      0  3     0x14200  bored         systqmp
  2715  215307      0      0  3     0x14200  bored         systq
 51831   29420      0      0  3     0x14200  tmoslp        softclockmp
 93939  414377      0      0  3  0x40014200  tmoslp        softclock
 28085  398091      0      0  3  0x40014200                idle0
 64071   52673      0      0  3     0x14200  kmalloc       kmthread
     1  193924      0      0  3        0x82  wait          init
     0       0     -1      0  3     0x10200  scheduler     swapper
ddb{1}> ps/o
    TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
  40873  84801   1000         0x3          0    3  c++
  18327  21344   1000         0x3          0    0  c++
 184701  37806   1000       0x803     0x2000    2K c++
*514616  58636   1000         0x3          0    1  c++

Need to switch back to cpu2 for its $dar.

ddb{1}> mach ddbcpu 2               
Stopped at      pool_do_put+0x194:      std r3,0(r4)
ddb{2}> print/x $dar
c000000086106f90
ddb{2}> call pmap_get_kernel_pte($dar)
0
ddb{2}> call pmap_vp_lookup(kernel_pmap_store, $dar)
0

The NULL from pmap_get_kernel_pte implies that $dar isn't in
pmap_ptable, so $dar points to unmapped memory.  (This is consistent
with bit 33 in dsisr.)  The NULL from pmap_vp_lookup implies that $dar
isn't in the kernel's VP table.  (Memory from pmap_kenter_pa would not
be in the VP table but would be PTE_WIRED in pmap_ptable.)

ddb{2}> x/qx kernel_map
kernel_map:                   ad6160
ddb{2}> show map/f 0ad6160
MAP 0xad6160: [0xc000000000000000->0xc0000007fffff000]
...
 - 0xc0000000109b7c78: 0xc000000085c61000->0xc00000008ccb3000: obj=0x0/0x0, ama
p=0x0/0
        submap=F, cow=F, nc=F, stack=F, prot(max)=3/3, inh=2, wc=0, adv=1
        hole=F, free=T, guard=0x0, free=0xc00000008ccb3000-0xc0000007fffff000
        fspace_augment=31997607936
        freemapped=T, uaddr=0xc0000000109caf60
                (0xc000000000000000-0xc0000007fffff000 uaddr_bestfit)
...

$dar is inside this entry in kernel_map in uvm,
  start = c00000008_5c61_000
    dar = c00000008_6106_f90
    end = c00000008_ccb3_000
prot=3 is just PROT_READ | PROT_WRITE.

A day later, on 2024-11-06, I got line numbers from objdump -dlr,

ddb{2}> trace
pool_do_put+0x194 -> subr_pool.c:862
pool_put+0x94 -> subr_pool.c:797
pmap_remove_pted+0x29c -> pmap.c:888
pmap_remove+0x130 -> pmap.c:1151
uvm_unmap_kill_entry_withlock+0x1c0 -> uvm_map.c:1916
... pmap.c r1.62, subr_pool.c r1.236, uvm_map.c r1.333

subr_pool.c:862 is the TAILQ_REMOVE in
                /*
                 * The page was previously completely full, move it to the
                 * partially-full list.
                 */
                TAILQ_REMOVE(&pp->pr_fullpages, ph, ph_entry);

The "std r3,0(r4)" that trapped is either
                (elm)->field.tqe_next->field.tqe_prev =                 \
                    (elm)->field.tqe_prev;                              \
or
                (head)->tqh_last = (elm)->field.tqe_prev;               \
in TAILQ_REMOVE in <sys/queue.h>.

pmap.c:888 refers to the pmap_pted_pool, which contains the struct
pte_desc for the VP table and PV lists.  I'm guessing that
(elm)->field.tqe_next pointed to an unmapped page.

pmap_pted_pool probably uses pool_page_alloc in subr_pool.c, which via
km_alloc in uvm_km.c, probably uses pmap_kenter_pa; so
(elm)->field.tqe_next should have pointed to a mapped page with a
PTE_WIRED entry from pmap_kenter_pa.
--gkoehler

Reply via email to