On Wed, Jul 02, 2025 at 10:54:24AM +0200, Orzel, Michal wrote:
> 
> 
> On 02/07/2025 10:26, Roger Pau Monné wrote:
> > On Wed, Jul 02, 2025 at 09:52:45AM +0200, Orzel, Michal wrote:
> >>
> >>
> >> On 02/07/2025 09:00, Roger Pau Monné wrote:
> >>> On Tue, Jul 01, 2025 at 01:46:19PM -0700, Stefano Stabellini wrote:
> >>>> On Tue, 1 Jul 2025, Jan Beulich wrote:
> >>>>> Sadly from this you omitted the output from the setup of the offsets
> >>>>> arrays. Considering also your later reply, I'd be curious to know what
> >>>>> mfn_to_pdx(0x50000000) is.
> >>>>  
> >>>> Full logs here, and debug patch in attachment.
> >>>>
> >>>> (XEN) Checking for initrd in /chosen
> >>>> (XEN) RAM: 0000000000000000 - 000000007fffffff
> >>>> (XEN) RAM: 0000000800000000 - 000000087fffffff
> >>>> (XEN) RAM: 0000050000000000 - 000005007fffffff
> >>>> (XEN) RAM: 0000060000000000 - 000006007fffffff
> >>>> (XEN) RAM: 0000070000000000 - 000007007fffffff
> >>>> (XEN) 
> >>>> (XEN) MODULE[0]: 0000000022000000 - 0000000022172fff Xen         
> >>>> (XEN) MODULE[1]: 0000000022200000 - 000000002220efff Device Tree 
> >>>> (XEN) MODULE[2]: 0000000020400000 - 0000000021e2ffff Kernel      
> >>>> (XEN)  RESVD[0]: 0000000000000000 - 0000000000ffffff
> >>>> (XEN)  RESVD[1]: 0000000001000000 - 00000000015fffff
> >>>> (XEN)  RESVD[2]: 0000000001600000 - 00000000017fffff
> >>>> (XEN)  RESVD[3]: 0000000001800000 - 00000000097fffff
> >>>> (XEN)  RESVD[4]: 0000000009800000 - 000000000bffffff
> >>>> (XEN)  RESVD[5]: 0000000011126000 - 000000001114dfff
> >>>> (XEN)  RESVD[6]: 000000001114e000 - 000000001214efff
> >>>> (XEN)  RESVD[7]: 0000000017275000 - 000000001729cfff
> >>>> (XEN)  RESVD[8]: 000000001729d000 - 000000001829dfff
> >>>> (XEN)  RESVD[9]: 000000001a7df000 - 000000001a806fff
> >>>> (XEN)  RESVD[10]: 000000001a807000 - 000000001b807fff
> >>>> (XEN)  RESVD[11]: 000000001d908000 - 000000001d92ffff
> >>>> (XEN)  RESVD[12]: 000000001d930000 - 000000001e930fff
> >>>> (XEN)  RESVD[13]: 000000001829e000 - 000000001869dfff
> >>>> (XEN)  RESVD[14]: 000000001869e000 - 00000000186ddfff
> >>>> (XEN)  RESVD[15]: 0000000800000000 - 000000083fffffff
> >>>> (XEN) 
> >>>> (XEN) 
> >>>> (XEN) Command line: console=dtuart dom0_mem=2048M 
> >>>> console_timestamps=boot debug bootscrub=0 vwfi=native sched=null
> >>>> (XEN) [00000006bfc302ec] parameter "debug" unknown!
> >>>> (XEN) [00000006bfcc0476] DEBUG init_pdx 294 start=0 end=80000000
> >>>> (XEN) [00000006bfcd2400] DEBUG init_pdx 294 start=800000000 end=880000000
> >>>> (XEN) [00000006bfce29ec] DEBUG init_pdx 294 start=50000000000 
> >>>> end=50080000000
> >>>> (XEN) [00000006bfcf1768] DEBUG init_pdx 294 start=60000000000 
> >>>> end=60080000000
> >>>> (XEN) [00000006bfd015a4] DEBUG init_pdx 294 start=70000000000 
> >>>> end=70080000000
> >>>> (XEN) [00000006bfd1444f] DEBUG setup_mm 252
> >>>> (XEN) [00000006bfd3dc6f] DEBUG setup_mm 273 start=0 size=80000000 
> >>>> ram_end=80000000 directmap_base_pdx=0
> >>>> (XEN) [00000006bfd5616e] DEBUG setup_directmap_mappings 229 base_mfn=0 
> >>>> nr_mfns=80000 directmap_base_pdx=0 mfn_to_pdx=0
> >>>> (XEN) [00000006bfd7d38a] DEBUG setup_directmap_mappings 237 base_mfn=0 
> >>>> nr_mfns=80000 directmap_base_pdx=0
> >>>> (XEN) [00000006bfd92728] DEBUG setup_mm 273 start=800000000 
> >>>> size=80000000 ram_end=880000000 directmap_base_pdx=0
> >>>> (XEN) [00000006bfdaba3b] DEBUG setup_directmap_mappings 229 
> >>>> base_mfn=800000 nr_mfns=80000 directmap_base_pdx=0 mfn_to_pdx=800000
> >>>> (XEN) [00000006bfdcd79c] DEBUG setup_directmap_mappings 237 
> >>>> base_mfn=800000 nr_mfns=80000 directmap_base_pdx=0
> >>>> (XEN) [00000006bfde4d82] DEBUG setup_mm 273 start=50000000000 
> >>>> size=80000000 ram_end=50080000000 directmap_base_pdx=0
> >>>> (XEN) [00000006bfdfaef0] DEBUG setup_directmap_mappings 229 
> >>>> base_mfn=50000000 nr_mfns=80000 directmap_base_pdx=0 mfn_to_pdx=50000000
> >>>> (XEN) [00000006bfe35249] Assertion '(mfn_to_pdx(maddr_to_mfn(ma)) - 
> >>>> directmap_base_pdx) < (DIRECTMAP_SIZE >> PAGE_SHIFT)' failed at 
> >>>> ./arch/arm/include/asm/mmu/mm.h:72
> >>>
> >>> As said on the other reply, the issue here is that with the v2 PDX
> >>> offset compression logic your memory map is not compressible, and this
> >>> leads to an overflow, as anything above 5TiB won't fit in the
> >>> directmap AFAICT.  We already discussed with Jan that ARM seems to be
> >>> missing any logic to account for the max addressable page:
> >>>
> >>> https://lore.kernel.org/xen-devel/9074f1a6-a605-43f4-97f3-d0a626252...@suse.com/
> >>>
> >>> x86 has setup_max_pdx() that truncates the maximum addressable MFN
> >>> based on the active PDX compression and the virtual memory map
> >>> restrictions.  ARM needs similar logic to account for this
> >>> restrictions.
> >>
> >> We have a few issues on Arm. First, we don't check whether direct map is 
> >> big
> >> enough provided max_pdx that we don't set at all. Second, we don't really 
> >> use
> >> PDX grouping (can be also used without compression). My patch (that Stefano
> >> attached previously) fixes the second issue (Allejandro will take it over 
> >> to
> >> come up with common solution).
> > 
> > You probably can handle those as different issues, as PDX grouping is
> > completely disjoint from PDX compression.  It might be helpful if
> > we could split the PDX grouping into a separate file from the PDX
> > compression.
> > 
> > One weirdness I've noticed with ARM is the addition of start offsets
> > to the existing PDX compression, by using directmap_base_pdx,
> > directmap_mfn_start, directmap_base_pdx &c.  I'm not sure whether this will
> > interfere with the PDX compression, but it looks like a bodge.  This
> > should be part of the generic PDX compression implementation, not an
> > extra added on a per-arch basis.
> > 
> > FWIW, PDX offset translation should already compress any gaps from 0
> > to the first RAM range, and hence this won't be needed (in fact it
> > would just make ARM translations slower by doing an extra unneeded
> > operation).  My recommendation would be to move this initial offset
> > compression inside the PDX mask translation.
> > 
> >> For the first issue, we need to know max_page (at
> >> the moment we calculate it in setup_mm() at the very end but we could do 
> >> it in
> >> init_pdx() to know it ahead of setting direct map) and PDX offset (on x86 
> >> there
> >> is no offset). I also think that on Arm we should just panic if direct map 
> >> is
> >> too small.
> > 
> > Hm, that's up to the ARM folks, but my opinion is that you should
> > simply ignore memory above the threshold.  Panicking should IMO be a
> > last resort option when there's no way to workaround the issue.
> On Arm we handle user errors and suspicious behavior usually as panics as 
> oppose
> to x86 which is more liberal in that regard. We want to fail as soon as 
> possible.
> 
> > 
> >> The issue can be reproduced by disabling PDX compression, so not only with
> >> Roger's patch.
> >>
> >> @Julien, I'm thinking of something like this:
> >>
> >> diff --git a/xen/arch/arm/arm32/mmu/mm.c b/xen/arch/arm/arm32/mmu/mm.c
> >> index 4d22f35618aa..e6d9b49acd3c 100644
> >> --- a/xen/arch/arm/arm32/mmu/mm.c
> >> +++ b/xen/arch/arm/arm32/mmu/mm.c
> >> @@ -190,7 +190,6 @@ void __init setup_mm(void)
> >>
> >>      /* Frame table covers all of RAM region, including holes */
> >>      setup_frametable_mappings(ram_start, ram_end);
> >> -    max_page = PFN_DOWN(ram_end);
> >>
> >>      /*
> >>       * The allocators may need to use map_domain_page() (such as for
> >> diff --git a/xen/arch/arm/arm64/mmu/mm.c b/xen/arch/arm/arm64/mmu/mm.c
> >> index a0a2dd8cc762..3e64be6ae664 100644
> >> --- a/xen/arch/arm/arm64/mmu/mm.c
> >> +++ b/xen/arch/arm/arm64/mmu/mm.c
> >> @@ -224,6 +224,9 @@ static void __init setup_directmap_mappings(unsigned 
> >> long
> >> base_mfn,
> >>           */
> >>          directmap_virt_start = DIRECTMAP_VIRT_START +
> >>              (base_mfn - mfn_gb) * PAGE_SIZE;
> >> +
> >> +        if ( (max_pdx - directmap_base_pdx) > (DIRECTMAP_SIZE >> 
> >> PAGE_SHIFT) )
> >> +            panic("Direct map is too small\n");
> > 
> > As said above - I would avoid propagating the usage of those offsets
> > into generic memory management code, it's usage should be confined
> > inside the translation functions.
> directmap_base_pdx is set a few lines above, so I would not call it 
> propagation.
> 
> > 
> > Here you probably want to use maddr_to_virt() or similar.
> I can't because maddr_to_virt() has the ASSERT with similar check.
> > 
> > You can maybe pickup:
> > 
> > https://lore.kernel.org/xen-devel/20250611171636.5674-3-roger....@citrix.com/
> > 
> > And attempt to hook it into ARM?
> As said above, we have different ways to approach setting max_pdx. On Arm we
> want to panic, on x86 you want to limit the max_pdx.
> 
> > 
> > I don't think it would that difficult to reduce the consumption of
> > memory map ranges to what Xen can handle.
> > 
> > Thanks, Roger.
> 
> The diff I sent fixes the issue for direct map now. We can take it now if we
> want to solve the issue. If we instead want to wait for frametable fixes (\wrt
> grouping) and possible PDX changes (making offsets common) to be done first, I
> can simply park this patch.

No please, don't park it just because of my opinions.  I think Julien
is OK with it, so don't hold back because of my x86 based opinion on
how to handle errors.

Regards, Roger.

Reply via email to