openbsd-ppc64-n2vi got another crash: UVM_PSEG_INUSE failed uvm_pager.c:227
panic uvm_pseg_release uvn_io uvn_get uvm_fault_lower uvm_fault trap trapagain type 300 during a bunch of go compiles. On Mon, May 27, 2024 at 5:34 PM Jeremie Courreges-Anglas <j...@wxcvbn.org> wrote: > > On Sat, May 25, 2024 at 12:35:16AM -0400, George Koehler wrote: > > On Tue, 21 May 2024 03:08:49 +0200 > > Jeremie Courreges-Anglas <j...@wxcvbn.org> wrote: > > > > > On Tue, May 21, 2024 at 02:51:39AM +0200, Jeremie Courreges-Anglas wrote: > > > > This doesn't look powerpc64-specific. It feels like > > > > uvm_km_kmemalloc_pla() should call pmap_enter() with PMAP_CANFAIL and > > > > unwind in case of a resource shortage. > > > > > > The diff below behaves when I inject fake pmap_enter() failures on > > > amd64. It would be nice to test it on -stable and/or -current, > > > depending on whether it happens on -stable only or also on -current. > > > > I believe that we have a powerpc64-specific problem, by which > > pmap_enter of kernel memory fails on powerpc64 when it succeeds on > > other platforms. > > > > powerpc64-1.ports.openbsd.org is a 16-core POWER9 where I run dpb(1) > > to build packages. In December 2022, it got this panic, > > > > ddb{13}> show panic > > cpu0: pmemrange allocation error: allocated 0 pages in 0 segments, but > > request > > was 1 pages in 1 segments > > cpu12: kernel diagnostic assertion "*start_ptr == > > uvm_map_entrybyaddr(atree, a > > ddr)" failed: file "/usr/src/sys/uvm/uvm_map.c", line 594 > > *cpu13: pmap_enter: failed to allocate pted > > > > A panic on some cpu can cause extra panics other cpus, because some > > events happen out of order: > > - The first cpu sends an IPI to each other cpu to go into ddb, > > before it disables the locks. > > - Some other cpu sees the locks being disabled, before it receives > > the IPI to go into ddb. The cpu skips acquiring some lock and > > trips on corrupt memory, perhaps by failing an assertion, or by > > dereferencing a poisoned pointer (powerpc64 trap type 300). > > ack, thanks for making this clearer. > > > I type "show panic" and try to find the original panic and ignore the > > extra panics. > > > > The same 16-core POWER9, in May 2023, got this panic, > > > > ddb{11}> show panic > > *cpu11: pmap_enter: failed to allocate pted > > ddb{11}> trace > > panic+0x134 > > pmap_enter+0x20c > > uvm_km_kmemalloc_pla+0x1f8 > > uvm_uarea_alloc+0x70 > > fork1+0x23c > > syscall+0x380 > > trap+0x5dc > > trapagain+0x4 > > --- syscall (number 2) --- > > End of kernel: 0xbffff434aa7bac60 lr 0xd165eb228594 > > ddb{11}> show struct uvm_km_pages uvm_km_pages > > struct uvm_km_pages at 0x1c171b8 (65592 bytes) {mtx = {mtx_owner = > > (volatile void *)0x0, mtx_wantipl = 0x7, mtx_oldipl = 0x0}, lowat = > > 0x200, hiwat = 0x2000, free = 0x0, page = 13835058060646207488, > > freelist = (struct uvm_km_free_page *)0x0, freelistlen = 0x0, km_proc > > = (struct proc *)0xc00000011426eb00} > > > > My habit was "show struct uvm_km_pages uvm_km_pages", because these > > panics always have uvm_km_pages.free == 0, which causes > > pool_get(&pmap_pted_pool, _) to fail and return NULL, which causes > > pmap_enter to panic "failed to allocate pted". > > > > It would not fail if uvm_km_thread can run and add more free pages to > > uvm_km_pages. I would want uvm_km_kmemalloc_pla to sleep (so > > uvm_km_thread can run), but maybe I can't sleep during uvm_uarea_alloc > > in the middle of a fork. > > IIUC uvm_uarea_alloc() calls uvm_km_kmemalloc_pla() without > UVM_KMF_NOWAIT/UVM_KMF_TRYLOCK, it should be ok with another potential > sleeping point. But pmap_enter() doesn't accept a flag to accept > sleeping. > > > (We have uvm_km_pages only if the platform > > has no direct map: powerpc64 has uvm_km_pages, amd64 doesn't.) > > > > In platforms other than powerpc64, pmap_enter(pmap_kernel(), _) does > > not allocate. For example, macppc's powerpc/pmap.c allocates every > > kernel pted at boot. > > Maybe this is a better approach. No idea if it was a deliberate > choice though. > > > My 4-core POWER9 at home never reproduced this panic, perhaps because > > 4 cores are too few to take free pages out of uvm_km_pages faster than > > uvm_km_thread can add them. The 16-core POWER9 has not reproduced > > "failed to allocate pted" in recent months. > > > > --gkoehler > > > > -- > jca