On Sat, May 09, 2020 at 11:33:40PM +0300, Andriy Gapon wrote: > On 09/05/2020 19:50, Konstantin Belousov wrote: > > On Sat, May 09, 2020 at 07:16:27PM +0300, Andriy Gapon wrote: > >> On 09/05/2020 19:13, Konstantin Belousov wrote: > >>> On Sat, May 09, 2020 at 06:52:24PM +0300, Andriy Gapon wrote: > >>>> On 08/05/2020 19:15, Konstantin Belousov wrote: > >>>>> On Fri, May 08, 2020 at 06:53:24PM +0300, Andriy Gapon wrote: > >>>>>> > >>>>>> I have a reproducible panic with a custom kernel without option NUMA > >>>>>> while using > >>>>>> amdgpu driver from linuxkpi-based drm: > >>>>>> > >>>>>> panic: address 41ec00000 beyond the last segment > >>>>>> > >>>>>> I did some quick debugging and the panic happens when Xorg server > >>>>>> tries to > >>>>>> access a frame buffer (or something like that). There is a page fault > >>>>>> that gets > >>>>>> satisfied by ttm with a fictitious page. > >>>>>> > >>>>>> The stack trace is: > >>>>>> #11 0xffffffff808031a3 in panic (fmt=0xffffffff8119a998 <cnputs_mtx> > >>>>>> "5\003ʀ\377\377\377\377") at > >>>>>> /usr/devel/git/motil/sys/kern/kern_shutdown.c:839 > >>>>>> #12 0xffffffff80bbc552 in pmap_enter (pmap=<optimized out>, > >>>>>> va=34504441856, > >>>>>> m=<optimized out>, prot=<optimized out>, flags=<optimized out>, > >>>>>> psind=<optimized > >>>>>> out>) at /usr/devel/git/motil/sys/amd64/amd64/pmap.c:6035 > >>>>>> #13 0xffffffff80b288be in vm_fault_populate (fs=<optimized out>) at > >>>>>> /usr/devel/git/motil/sys/vm/vm_fault.c:519 > >>>>>> #14 vm_fault_allocate (fs=<optimized out>) at > >>>>>> /usr/devel/git/motil/sys/vm/vm_fault.c:1032 > >>>>>> #15 vm_fault (map=<optimized out>, vaddr=<optimized out>, > >>>>>> fault_type=<optimized > >>>>>> out>, fault_flags=<optimized out>, m_hold=<optimized out>) at > >>>>>> /usr/devel/git/motil/sys/vm/vm_fault.c:1342 > >>>>>> #16 0xffffffff80b26e7e in vm_fault_trap (map=0xfffffe0017cd39e8, > >>>>>> vaddr=<optimized out>, fault_type=<optimized out>, fault_flags=0, > >>>>>> signo=0xfffffe00a810dbc4, ucode=0xfffffe00a810dbc0) at > >>>>>> /usr/devel/git/motil/sys/vm/vm_fault.c:589 > >>>>>> #17 0xffffffff80bcf89c in trap_pfault (frame=0xfffffe00a810dc00, > >>>>>> usermode=<optimized out>, signo=<optimized out>, > >>>>>> ucode=0xffffffff80853250 > >>>>>> <putchar>) at /usr/devel/git/motil/sys/amd64/amd64/trap.c:821 > >>>>>> #18 0xffffffff80bceeec in trap (frame=0xfffffe00a810dc00) at > >>>>>> /usr/devel/git/motil/sys/amd64/amd64/trap.c:34 > >>>>>> > >>>>>> > >>>>>> The line number in pmap_enter() is incorrect, I guess because of > >>>>>> optimizations. > >>>>>> The assert seems to be reached via pmap_enter -> > >>>>>> CHANGE_PV_LIST_LOCK_TO_PHYS -> > >>>>>> PHYS_TO_PV_LIST_LOCK -> pa_index(). > >>>>>> > >>>>>> The panic in correct in that the page is fictitious and its physical > >>>>>> address is > >>>>>> beyond the end of real physical memory. > >>>>>> It seems that NUMA PHYS_TO_PV_LIST_LOCK() is aware of such pages, but > >>>>>> !NUMA one > >>>>>> is not. > >>>>> > >>>>> I think you can remove this assert. pa_index() is always taken by > >>>>> % NVP_LIST_LOCKS, because fictitious mappings are not promoted. > >>>>> > >>>>> Try that and commit if it works for you. > >>>> > >>>> I tried this change: > >>>> diff --git a/sys/amd64/amd64/pmap.c b/sys/amd64/amd64/pmap.c > >>>> index 4deed86a76d1a..b834b7f0388b7 100644 > >>>> --- a/sys/amd64/amd64/pmap.c > >>>> +++ b/sys/amd64/amd64/pmap.c > >>>> @@ -345,7 +345,7 @@ pmap_pku_mask_bit(pmap_t pmap) > >>>> #define NPV_LIST_LOCKS MAXCPU > >>>> > >>>> #define PHYS_TO_PV_LIST_LOCK(pa) \ > >>>> - (&pv_list_locks[pa_index(pa) % NPV_LIST_LOCKS]) > >>>> + (&pv_list_locks[((pa) >> PDRSHIFT) % > >>>> NPV_LIST_LOCKS]) > >>>> #endif > >>>> > >>>> #define CHANGE_PV_LIST_LOCK_TO_PHYS(lockp, pa) do { \ > >>>> > >>>> It fixed the original problem, but I got a new panic. > >>>> "DI already started" in pmap_remove() -> pmap_delayed_invl_start_u(). > >>>> I guess that !NUMA variant does not get much testing, so I'll probably > >>>> just > >>>> stick with the default. > >>> Why didn't you just removed the KASSERT from pa_index ? > >> > >> Well, I thought it might be useful in the NUMA case. > >> pa_index() definition is shared between both cases. > > Might be define the macro two times, for NUMA/non-NUMA. non-NUMA case > > does not need the assert, because users take it mod NPV_LIST_LOCKS. > > > > I still don't see how that could help with "DI already started" panic.
Might be not, might be it would help due to pmap_delayed_invl_genp(). But I would more worry about this 'already started' issue, because this must not happen. Can you remove the assert from the macro and provide backtrace of 'DI already started' panic ? _______________________________________________ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"