On Sat, 2 Nov 2024 05:23:17 -0700 Eric Grosse <gro...@gmail.com> wrote:
> Since David Gwynne's locking changes in uvm on October 30, I have seen > no more kernel panics on this workload (compiling /usr/src -j24 on > 8core Power9) that consistently crashed before. Thank you, sir! The tip, "compiling /usr/src -j24", was what I needed to reproduce the problem on my powerpc64. My kernel crashed into ddb at subr_pool.c:862, because a page in pmap_pted_pool wasn't mapped. My guess is that these crashes and panics only happen on powerpc64, and always in pmap_pted_pool or pmap_vp_pool. My kernel has dlg's change (uvm_map r1.333, try to simplify the locking code around busy maps.), so the change didn't fix it. My kernel has 2 diffs, - https://marc.info/?l=openbsd-bugs&m=171721578026377&w=2 changing memory barriers from "isync" to "sync", not committed. - https://marc.info/?l=openbsd-tech&m=173053739517589&w=2 before it became uvm_pdaemon.c r1.72. My kernel does not have option WITNESS. If it did, uvm_fault might have panicked in a WITNESS check. This is a 4-core POWER9 with 8G ram and 8448M swap. I can't do "make -j16 build" in /usr/src, because it runs out of ram and swap during llvm-tblgen. Instead, I do my usual "make -j4 build" in /usr/src, then "make clean" and "make -j16" in libLLVM. On 2024-11-05, "make -j16" in /usr/src/gnu/usr.bin/clang/libLLVM quickly caused a c++ crash compiling AMDGPUArgumentUsageInfo.cpp, c++: error: clang frontend command failed with exit code 139 ... then entered ddb at a trap 0x300 (data storage trap), dar 0xc000000086106f90 dsisr 0x42000000 trap type 300 srr1 9000000000009032 at 5fd5d0 lr 5fd584 Stopped at pool_do_put+0x194: std r3,0(r4) ddb{2}> show panic the kernel did not panic This dar (data address) is inside kernel virtual address space, segment page VM_MIN_KERNEL_ADDRESS = c00000000_0000_000 dar = c00000008_6106_f90 VM_MAX_KERNEL_ADDRESS = c0000007f_ffff_fff The dsisr (data storage interrupt status register) has 2 bits, 4000_0000 = bit 33: not found in page table 0200_0000 = bit 38: store implying that the kernel is writing to unmapped memory. ddb{2}> trace pool_do_put+0x194 pool_put+0x94 pmap_remove_pted+0x29c pmap_remove+0x130 uvm_unmap_kill_entry_withlock+0x1c0 uvm_unmap_remove+0x3cc uvm_unmap+0x88 coredump_unmap+0x30 coredump_elf+0x15c coredump+0x3a8 sigexit+0x88 postsig+0x304 userret+0xd8 trap+0x904 trapagain+0x4 --- trap (type 0x400) --- End of kernel: 0xbffff4a8dee71e60 lr 0x47204c6c4 ddb{2}> mach cpuinfo 0: stopped 1: stopped * 2: ddb 3: stopped cpu2 got trap 0x300 while dumping core for a user process that failed an instruction fetch (trap 0x400, probably SIGSEGV). ddb{3}> trace cpu_intr+0x50 xive_hvi+0x1b8 hvi_intr+0x38 trap+0xd4 trapagain+0x4 --- trap (type 0xea0) --- mtx_enter+0x5c pool_lock_mtx_enter+0x28 pool_get+0xa8 pmap_enter+0x1ac uvm_fault_lower+0x8dc uvm_fault+0x200 trap+0x4a8 trapagain+0x4 --- trap (type 0x300) --- End of kernel: 0xbffff3c8eccd50f0 lr 0x4d86000fc ddb{3}> mach ddbcpu 0 Stopped at cpu_intr+0x50: ori r0,r0,0x0 ddb{0}> trace cpu_intr+0x50 xive_hvi+0x1b8 hvi_intr+0x38 trap+0xd4 trapagain+0x4 --- trap (type 0xea0) --- _kernel_lock+0xe4 xive_hvi+0x1a0 hvi_intr+0x38 trap+0xd4 trapagain+0x4 --- trap (type 0xea0) --- _kernel_lock+0xe0 xive_hvi+0x1a0 hvi_intr+0x38 trap+0xd4 trapagain+0x4 --- trap (type 0xea0) --- End of kernel: 0xbffff1c6a40f6180 lr 0x135aaf60 ddb{0}> mach ddbcpu 1 Stopped at cpu_intr+0x50: ori r0,r0,0x0 ddb{1}> trace cpu_intr+0x50 xive_hvi+0x1b8 hvi_intr+0x38 trap+0xd4 trapagain+0x4 --- trap (type 0xea0) --- mtx_enter+0x5c pool_lock_mtx_enter+0x28 pool_get+0xa8 pmap_enter+0x1ac uvm_fault_lower+0x8dc uvm_fault+0x200 trap+0x4a8 trapagain+0x4 --- trap (type 0x300) --- End of kernel: 0xbffff56d6a95e3f0 lr 0x1540a180 The other CPUs were waiting for locks, while handling an interrupt on cpu0, and trap 0x300 from user processes on cpu1 and cpu3. ddb{1}> ps PID TID PPID UID S FLAGS WAIT COMMAND 84801 40873 67936 1000 7 0x3 c++ 33199 477867 7091 1000 2 0x3 c++ 96353 101250 531 1000 2 0x3 c++ 21344 18327 21102 1000 7 0x3 c++ 88128 182318 61620 1000 2 0x3 c++ 37806 184701 93992 1000 7 0x2803 c++ 78163 452720 97105 1000 2 0x3 c++ 58191 481502 78370 1000 2 0x3 c++ 96769 63855 59555 1000 2 0x3 c++ 531 224052 59207 1000 3 0x10008b sigsusp sh 61620 285046 59207 1000 3 0x10008b sigsusp sh 74865 41174 76091 1000 2 0x3 c++ 76803 128670 71611 1000 2 0x3 c++ 47620 110479 2022 1000 2 0x3 c++ 59555 262350 59207 1000 3 0x10008b sigsusp sh 78370 393482 59207 1000 3 0x10008b sigsusp sh 38651 409124 97630 1000 2 0x3 c++ 7091 203197 59207 1000 3 0x10008b sigsusp sh 21102 213779 59207 1000 3 0x10008b sigsusp sh 80444 184 69476 1000 2 0x3 c++ 97105 373651 59207 1000 3 0x10008b sigsusp sh 2022 250517 59207 1000 3 0x10008b sigsusp sh 76091 276059 59207 1000 3 0x10008b sigsusp sh 97630 523875 59207 1000 3 0x10008b sigsusp sh 71611 272837 59207 1000 3 0x10008b sigsusp sh 69476 252045 59207 1000 3 0x10008b sigsusp sh 88929 482059 33978 1000 2 0x3 c++ 93992 239732 59207 1000 3 0x10008b sigsusp sh *58636 514616 2598 1000 7 0x3 c++ 67936 407623 59207 1000 3 0x10008b sigsusp sh 33978 464490 59207 1000 3 0x10008b sigsusp sh 2598 316802 59207 1000 3 0x10008b sigsusp sh 59207 131223 86919 1000 3 0x10008b sigsusp make 82725 478883 58813 1000 3 0x100083 kqread top 58813 12839 62361 1000 3 0x10008b sigsusp ksh 62361 369886 71833 1000 3 0x98 kqread sshd-session 71833 317743 90483 0 3 0x92 kqread sshd-session 86919 190875 36007 1000 3 0x10008b sigsusp ksh 36007 24811 1 1000 3 0x100080 kqread tmux 67124 167227 32763 1000 3 0x100083 kqread tmux 32763 20652 24825 1000 3 0x10008b sigsusp ksh 24825 321382 22401 1000 3 0x98 kqread sshd-session 22401 341909 90483 0 3 0x92 kqread sshd-session 33839 412563 1 0 3 0x100083 ttyin getty 51382 509310 1 0 3 0x100098 kqread cron 35442 220424 1 0 3 0x80 kqread apmd 47321 438992 1 99 3 0x1100090 kqread sndiod 50236 450838 1 110 3 0x100090 kqread sndiod 56266 110208 24101 95 3 0x1100092 kqread smtpd 93717 189563 24101 103 3 0x1100092 kqread smtpd 65589 477803 24101 95 3 0x1100092 kqread smtpd 13555 168486 24101 95 3 0x100092 kqread smtpd 55171 516901 24101 95 3 0x1100092 kqread smtpd 23668 196445 24101 95 3 0x1100092 kqread smtpd 24101 500988 1 0 3 0x100080 kqread smtpd 90483 376863 1 0 3 0x88 kqread sshd 17145 283577 1 0 3 0x100080 kqread ntpd 78329 107955 89943 83 3 0x100092 kqread ntpd 89943 509297 1 83 3 0x1100092 kqread ntpd 48165 154592 93642 74 3 0x1100092 bpf pflogd 93642 275216 1 0 3 0x80 sbwait pflogd 41570 449178 33311 73 3 0x1100090 kqread syslogd 33311 20425 1 0 3 0x100082 sbwait syslogd 63280 414530 1 0 3 0x100080 kqread resolvd 9034 320677 11250 77 3 0x100092 kqread dhcpleased 59329 174084 11250 77 3 0x100092 kqread dhcpleased 11250 325196 1 0 3 0x80 kqread dhcpleased 81172 433081 71724 115 3 0x100092 kqread slaacd 32577 203131 71724 115 3 0x100092 kqread slaacd 71724 271786 1 0 3 0x100080 kqread slaacd 90514 160684 0 0 3 0x14200 bored smr 9849 316752 0 0 2 0x14200 zerothread 37944 159183 0 0 3 0x14200 aiodoned aiodoned 83959 87621 0 0 3 0x14200 syncer update 39840 93501 0 0 3 0x14200 cleaner cleaner 60281 230067 0 0 3 0x14200 reaper reaper 35703 239731 0 0 3 0x14200 pgdaemon pagedaemon 80142 370629 0 0 3 0x14200 bored srdis 42472 341809 0 0 3 0x14200 usbtsk usbtask 18228 293761 0 0 3 0x14200 usbatsk usbatsk 78224 352752 0 0 3 0x14200 bored ipmicmd 92823 380649 0 0 3 0x14200 ipmi_poll ipmi0 77480 294029 0 0 3 0x14200 bored sensors 96848 114487 0 0 3 0x40014200 idle3 37819 295743 0 0 3 0x40014200 idle2 42420 517275 0 0 3 0x40014200 idle1 38497 366277 0 0 3 0x14200 bored softnet3 987 386467 0 0 3 0x14200 bored softnet2 41113 127693 0 0 3 0x14200 bored softnet1 82856 280125 0 0 3 0x14200 bored softnet0 60281 230067 0 0 3 0x14200 reaper reaper 35703 239731 0 0 3 0x14200 pgdaemon pagedaemon 80142 370629 0 0 3 0x14200 bored srdis 42472 341809 0 0 3 0x14200 usbtsk usbtask 18228 293761 0 0 3 0x14200 usbatsk usbatsk 78224 352752 0 0 3 0x14200 bored ipmicmd 92823 380649 0 0 3 0x14200 ipmi_poll ipmi0 77480 294029 0 0 3 0x14200 bored sensors 96848 114487 0 0 3 0x40014200 idle3 37819 295743 0 0 3 0x40014200 idle2 42420 517275 0 0 3 0x40014200 idle1 38497 366277 0 0 3 0x14200 bored softnet3 987 386467 0 0 3 0x14200 bored softnet2 41113 127693 0 0 3 0x14200 bored softnet1 82856 280125 0 0 3 0x14200 bored softnet0 21154 82752 0 0 3 0x14200 bored systqmp 2715 215307 0 0 3 0x14200 bored systq 51831 29420 0 0 3 0x14200 tmoslp softclockmp 93939 414377 0 0 3 0x40014200 tmoslp softclock 28085 398091 0 0 3 0x40014200 idle0 64071 52673 0 0 3 0x14200 kmalloc kmthread 1 193924 0 0 3 0x82 wait init 0 0 -1 0 3 0x10200 scheduler swapper ddb{1}> ps/o TID PID UID PRFLAGS PFLAGS CPU COMMAND 40873 84801 1000 0x3 0 3 c++ 18327 21344 1000 0x3 0 0 c++ 184701 37806 1000 0x803 0x2000 2K c++ *514616 58636 1000 0x3 0 1 c++ Need to switch back to cpu2 for its $dar. ddb{1}> mach ddbcpu 2 Stopped at pool_do_put+0x194: std r3,0(r4) ddb{2}> print/x $dar c000000086106f90 ddb{2}> call pmap_get_kernel_pte($dar) 0 ddb{2}> call pmap_vp_lookup(kernel_pmap_store, $dar) 0 The NULL from pmap_get_kernel_pte implies that $dar isn't in pmap_ptable, so $dar points to unmapped memory. (This is consistent with bit 33 in dsisr.) The NULL from pmap_vp_lookup implies that $dar isn't in the kernel's VP table. (Memory from pmap_kenter_pa would not be in the VP table but would be PTE_WIRED in pmap_ptable.) ddb{2}> x/qx kernel_map kernel_map: ad6160 ddb{2}> show map/f 0ad6160 MAP 0xad6160: [0xc000000000000000->0xc0000007fffff000] ... - 0xc0000000109b7c78: 0xc000000085c61000->0xc00000008ccb3000: obj=0x0/0x0, ama p=0x0/0 submap=F, cow=F, nc=F, stack=F, prot(max)=3/3, inh=2, wc=0, adv=1 hole=F, free=T, guard=0x0, free=0xc00000008ccb3000-0xc0000007fffff000 fspace_augment=31997607936 freemapped=T, uaddr=0xc0000000109caf60 (0xc000000000000000-0xc0000007fffff000 uaddr_bestfit) ... $dar is inside this entry in kernel_map in uvm, start = c00000008_5c61_000 dar = c00000008_6106_f90 end = c00000008_ccb3_000 prot=3 is just PROT_READ | PROT_WRITE. A day later, on 2024-11-06, I got line numbers from objdump -dlr, ddb{2}> trace pool_do_put+0x194 -> subr_pool.c:862 pool_put+0x94 -> subr_pool.c:797 pmap_remove_pted+0x29c -> pmap.c:888 pmap_remove+0x130 -> pmap.c:1151 uvm_unmap_kill_entry_withlock+0x1c0 -> uvm_map.c:1916 ... pmap.c r1.62, subr_pool.c r1.236, uvm_map.c r1.333 subr_pool.c:862 is the TAILQ_REMOVE in /* * The page was previously completely full, move it to the * partially-full list. */ TAILQ_REMOVE(&pp->pr_fullpages, ph, ph_entry); The "std r3,0(r4)" that trapped is either (elm)->field.tqe_next->field.tqe_prev = \ (elm)->field.tqe_prev; \ or (head)->tqh_last = (elm)->field.tqe_prev; \ in TAILQ_REMOVE in <sys/queue.h>. pmap.c:888 refers to the pmap_pted_pool, which contains the struct pte_desc for the VP table and PV lists. I'm guessing that (elm)->field.tqe_next pointed to an unmapped page. pmap_pted_pool probably uses pool_page_alloc in subr_pool.c, which via km_alloc in uvm_km.c, probably uses pmap_kenter_pa; so (elm)->field.tqe_next should have pointed to a mapped page with a PTE_WIRED entry from pmap_kenter_pa. --gkoehler