On Sat, May 09, 2020 at 06:52:24PM +0300, Andriy Gapon wrote:
> On 08/05/2020 19:15, Konstantin Belousov wrote:
> > On Fri, May 08, 2020 at 06:53:24PM +0300, Andriy Gapon wrote:
> >>
> >> I have a reproducible panic with a custom kernel without option NUMA while
> >> using
> >> amdgpu driver from linuxkpi-based drm:
> >>
> >> panic: address 41ec00000 beyond the last segment
> >>
> >> I did some quick debugging and the panic happens when Xorg server tries to
> >> access a frame buffer (or something like that). There is a page fault
> >> that gets
> >> satisfied by ttm with a fictitious page.
> >>
> >> The stack trace is:
> >> #11 0xffffffff808031a3 in panic (fmt=0xffffffff8119a998 <cnputs_mtx>
> >> "5\003ʀ\377\377\377\377") at
> >> /usr/devel/git/motil/sys/kern/kern_shutdown.c:839
> >> #12 0xffffffff80bbc552 in pmap_enter (pmap=<optimized out>, va=34504441856,
> >> m=<optimized out>, prot=<optimized out>, flags=<optimized out>,
> >> psind=<optimized
> >> out>) at /usr/devel/git/motil/sys/amd64/amd64/pmap.c:6035
> >> #13 0xffffffff80b288be in vm_fault_populate (fs=<optimized out>) at
> >> /usr/devel/git/motil/sys/vm/vm_fault.c:519
> >> #14 vm_fault_allocate (fs=<optimized out>) at
> >> /usr/devel/git/motil/sys/vm/vm_fault.c:1032
> >> #15 vm_fault (map=<optimized out>, vaddr=<optimized out>,
> >> fault_type=<optimized
> >> out>, fault_flags=<optimized out>, m_hold=<optimized out>) at
> >> /usr/devel/git/motil/sys/vm/vm_fault.c:1342
> >> #16 0xffffffff80b26e7e in vm_fault_trap (map=0xfffffe0017cd39e8,
> >> vaddr=<optimized out>, fault_type=<optimized out>, fault_flags=0,
> >> signo=0xfffffe00a810dbc4, ucode=0xfffffe00a810dbc0) at
> >> /usr/devel/git/motil/sys/vm/vm_fault.c:589
> >> #17 0xffffffff80bcf89c in trap_pfault (frame=0xfffffe00a810dc00,
> >> usermode=<optimized out>, signo=<optimized out>, ucode=0xffffffff80853250
> >> <putchar>) at /usr/devel/git/motil/sys/amd64/amd64/trap.c:821
> >> #18 0xffffffff80bceeec in trap (frame=0xfffffe00a810dc00) at
> >> /usr/devel/git/motil/sys/amd64/amd64/trap.c:34
> >>
> >>
> >> The line number in pmap_enter() is incorrect, I guess because of
> >> optimizations.
> >> The assert seems to be reached via pmap_enter ->
> >> CHANGE_PV_LIST_LOCK_TO_PHYS ->
> >> PHYS_TO_PV_LIST_LOCK -> pa_index().
> >>
> >> The panic in correct in that the page is fictitious and its physical
> >> address is
> >> beyond the end of real physical memory.
> >> It seems that NUMA PHYS_TO_PV_LIST_LOCK() is aware of such pages, but
> >> !NUMA one
> >> is not.
> >
> > I think you can remove this assert. pa_index() is always taken by
> > % NVP_LIST_LOCKS, because fictitious mappings are not promoted.
> >
> > Try that and commit if it works for you.
>
> I tried this change:
> diff --git a/sys/amd64/amd64/pmap.c b/sys/amd64/amd64/pmap.c
> index 4deed86a76d1a..b834b7f0388b7 100644
> --- a/sys/amd64/amd64/pmap.c
> +++ b/sys/amd64/amd64/pmap.c
> @@ -345,7 +345,7 @@ pmap_pku_mask_bit(pmap_t pmap)
> #define NPV_LIST_LOCKS MAXCPU
>
> #define PHYS_TO_PV_LIST_LOCK(pa) \
> - (&pv_list_locks[pa_index(pa) % NPV_LIST_LOCKS])
> + (&pv_list_locks[((pa) >> PDRSHIFT) % NPV_LIST_LOCKS])
> #endif
>
> #define CHANGE_PV_LIST_LOCK_TO_PHYS(lockp, pa) do { \
>
> It fixed the original problem, but I got a new panic.
> "DI already started" in pmap_remove() -> pmap_delayed_invl_start_u().
> I guess that !NUMA variant does not get much testing, so I'll probably just
> stick with the default.
Why didn't you just removed the KASSERT from pa_index ?
_______________________________________________
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"