On 21/10/2024 10:53 am, Andrew Cooper wrote:
> On 21/10/2024 1:45 am, Daniel P. Smith wrote:
>> To start transitioning consider_modules() over to struct boot_module, begin
>> with taking the array of struct boot_modules but use the temporary struct
>> element mod.
>>
>> No functional change intended.
>>
>> Signed-off-by: Daniel P. Smith <dpsm...@apertussolutions.com>
>> Reviewed-by: Jason Andryuk <jason.andr...@amd.com>
> I've taken patch 1 (currently in testing to push).
>
> However, this change breaks PV Shim (same test case that I debugged the
> OSSTest failure with on Saturday).
>
> (XEN) PVH start info: (pa 0000ffc0)
> (XEN)   version:    1
> (XEN)   flags:      0
> (XEN)   nr_modules: 2
> (XEN)   modlist_pa: 000000000000ff60
> (XEN)   cmdline_pa: 000000000000ffa0
> (XEN)   cmdline:    'pv-shim console=xen,pv'
> (XEN)   rsdp_pa:    00000000fc008000
> (XEN)     mod[0].pa:         0000000001042000
> (XEN)     mod[0].size:       0000000006907440
> (XEN)     mod[0].cmdline_pa: 0000000000000000
> (XEN)     mod[1].pa:         00000000016d9000
> (XEN)     mod[1].size:       0000000019870751
> (XEN)     mod[1].cmdline_pa: 0000000000000000
> (XEN) Bootloader: PVH Directboot
> (XEN) Command line: pv-shim console=xen,pv
> (XEN) Xen image load base address: 0
> ...
> (XEN) ----[ Xen-4.20-unstable  x86_64  debug=y ubsan=y  Not tainted ]----
> (XEN) CPU:    0
> (XEN) RIP:    e008:[<ffff82d040a51d38>]
> arch/x86/pv/dom0_build.c#dom0_construct+0x2020/0x24eb
> ...
> (XEN) Xen call trace:
> (XEN)    [<ffff82d040a51d38>] R
> arch/x86/pv/dom0_build.c#dom0_construct+0x2020/0x24eb
> (XEN)    [<ffff82d040a52222>] F dom0_construct_pv+0x1f/0xcd
> (XEN)    [<ffff82d040a7bb27>] F construct_dom0+0xad/0xc0
> (XEN)    [<ffff82d040a6eb3f>] F __start_xen+0x50f2/0x53aa
> (XEN)    [<ffff82d040206184>] F __high_start+0x94/0xa0
> ...
> (XEN) Pagetable walk from ffff8287fffffff8:
> (XEN)  L4[0x105] = 000000001fa17063 ffffffffffffffff
> (XEN)  L3[0x01f] = 0000000000000000 ffffffffffffffff
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) FATAL PAGE FAULT
> (XEN) [error_code=0000]
> (XEN) Faulting linear address: ffff8287fffffff8
> (XEN) ****************************************
>
> We've got a wild access, but it's not clear exactly what's going on here.
>
> addr2line says it's line 904 which is in the middle of the
> page_list_for_each() loop setting up the physmap.
>
> My gut feeling is that the pointer -> array conversion in
> consider_modules() isn't correct, because that's literally the only
> non-trivial change, but I can't spot anything concretely wrong...

To follow up, this isn't really the fault of this patch.

It's a hindenbug left by prior patches.  Fix at
https://lore.kernel.org/xen-devel/20241022124114.84498-1-andrew.coop...@citrix.com/T/#u

~Andrew

Reply via email to