openbsd-ppc64-n2vi got another crash:

UVM_PSEG_INUSE failed     uvm_pager.c:227

panic
uvm_pseg_release
uvn_io
uvn_get
uvm_fault_lower
uvm_fault
trap
trapagain
  type 300

during a bunch of go compiles.

On Mon, May 27, 2024 at 5:34 PM Jeremie Courreges-Anglas <j...@wxcvbn.org> 
wrote:
>
> On Sat, May 25, 2024 at 12:35:16AM -0400, George Koehler wrote:
> > On Tue, 21 May 2024 03:08:49 +0200
> > Jeremie Courreges-Anglas <j...@wxcvbn.org> wrote:
> >
> > > On Tue, May 21, 2024 at 02:51:39AM +0200, Jeremie Courreges-Anglas wrote:
> > > > This doesn't look powerpc64-specific.  It feels like
> > > > uvm_km_kmemalloc_pla() should call pmap_enter() with PMAP_CANFAIL and
> > > > unwind in case of a resource shortage.
> > >
> > > The diff below behaves when I inject fake pmap_enter() failures on
> > > amd64.  It would be nice to test it on -stable and/or -current,
> > > depending on whether it happens on -stable only or also on -current.
> >
> > I believe that we have a powerpc64-specific problem, by which
> > pmap_enter of kernel memory fails on powerpc64 when it succeeds on
> > other platforms.
> >
> > powerpc64-1.ports.openbsd.org is a 16-core POWER9 where I run dpb(1)
> > to build packages.  In December 2022, it got this panic,
> >
> > ddb{13}> show panic
> >  cpu0: pmemrange allocation error: allocated 0 pages in 0 segments, but 
> > request
> >  was 1 pages in 1 segments
> >  cpu12: kernel diagnostic assertion "*start_ptr == 
> > uvm_map_entrybyaddr(atree, a
> > ddr)" failed: file "/usr/src/sys/uvm/uvm_map.c", line 594
> > *cpu13: pmap_enter: failed to allocate pted
> >
> > A panic on some cpu can cause extra panics other cpus, because some
> > events happen out of order:
> >  - The first cpu sends an IPI to each other cpu to go into ddb,
> >    before it disables the locks.
> >  - Some other cpu sees the locks being disabled, before it receives
> >    the IPI to go into ddb.  The cpu skips acquiring some lock and
> >    trips on corrupt memory, perhaps by failing an assertion, or by
> >    dereferencing a poisoned pointer (powerpc64 trap type 300).
>
> ack, thanks for making this clearer.
>
> > I type "show panic" and try to find the original panic and ignore the
> > extra panics.
> >
> > The same 16-core POWER9, in May 2023, got this panic,
> >
> > ddb{11}> show panic
> > *cpu11: pmap_enter: failed to allocate pted
> > ddb{11}> trace
> > panic+0x134
> > pmap_enter+0x20c
> > uvm_km_kmemalloc_pla+0x1f8
> > uvm_uarea_alloc+0x70
> > fork1+0x23c
> > syscall+0x380
> > trap+0x5dc
> > trapagain+0x4
> > --- syscall (number 2) ---
> > End of kernel: 0xbffff434aa7bac60 lr 0xd165eb228594
> > ddb{11}> show struct uvm_km_pages uvm_km_pages
> > struct uvm_km_pages at 0x1c171b8 (65592 bytes) {mtx = {mtx_owner =
> > (volatile void *)0x0, mtx_wantipl = 0x7, mtx_oldipl = 0x0}, lowat =
> > 0x200, hiwat = 0x2000, free = 0x0, page = 13835058060646207488,
> > freelist = (struct uvm_km_free_page *)0x0, freelistlen = 0x0, km_proc
> > = (struct proc *)0xc00000011426eb00}
> >
> > My habit was "show struct uvm_km_pages uvm_km_pages", because these
> > panics always have uvm_km_pages.free == 0, which causes
> > pool_get(&pmap_pted_pool, _) to fail and return NULL, which causes
> > pmap_enter to panic "failed to allocate pted".
> >
> > It would not fail if uvm_km_thread can run and add more free pages to
> > uvm_km_pages.  I would want uvm_km_kmemalloc_pla to sleep (so
> > uvm_km_thread can run), but maybe I can't sleep during uvm_uarea_alloc
> > in the middle of a fork.
>
> IIUC uvm_uarea_alloc() calls uvm_km_kmemalloc_pla() without
> UVM_KMF_NOWAIT/UVM_KMF_TRYLOCK, it should be ok with another potential
> sleeping point.  But pmap_enter() doesn't accept a flag to accept
> sleeping.
>
> > (We have uvm_km_pages only if the platform
> > has no direct map: powerpc64 has uvm_km_pages, amd64 doesn't.)
> >
> > In platforms other than powerpc64, pmap_enter(pmap_kernel(), _) does
> > not allocate.  For example, macppc's powerpc/pmap.c allocates every
> > kernel pted at boot.
>
> Maybe this is a better approach.  No idea if it was a deliberate
> choice though.
>
> > My 4-core POWER9 at home never reproduced this panic, perhaps because
> > 4 cores are too few to take free pages out of uvm_km_pages faster than
> > uvm_km_thread can add them.  The 16-core POWER9 has not reproduced
> > "failed to allocate pted" in recent months.
> >
> > --gkoehler
> >
>
> --
> jca

Reply via email to