On 05/01/2023 4:05 pm, Jan Beulich wrote: > Like for the various HVM-only types, save a little bit of code by suitably > "masking" this type out when !PV32.
add/remove: 0/1 grow/shrink: 2/4 up/down: 544/-922 (-378) Function old new delta sh_map_and_validate_gl2e__guest_4 136 666 +530 sh_destroy_shadow 289 303 +14 sh_clear_shadow_entry__guest_4 178 176 -2 sh_validate_guest_entry 521 472 -49 sh_map_and_validate_gl2he__guest_4 136 2 -134 sh_remove_shadows 4757 4545 -212 validate_gl2e 525 - -525 Total: Before=3914702, After=3914324, chg -0.01% Marginal... > > Signed-off-by: Jan Beulich <jbeul...@suse.com> > --- > I wasn't really sure whether it would be worthwhile to also update the > "#else" part of shadow_size(). Doing so would be a little tricky, as the > type to return 0 for has no name right now; I'd need to move down the > #undef to allow for that. Thoughts? This refers to the straight deletion from sh_type_to_size[] ? I was confused by that at first. The shadow does have a size of 1. Might /* [SH_type_l2h_64_shadow] = 1, PV32 only */ work better? That leaves it clearly in there as a 1, but not needing any further ifdefary. > --- a/xen/arch/x86/mm/shadow/multi.c > +++ b/xen/arch/x86/mm/shadow/multi.c > @@ -859,13 +866,12 @@ do { > int _i; \ > int _xen = !shadow_mode_external(_dom); \ > shadow_l2e_t *_sp = map_domain_page((_sl2mfn)); \ > - ASSERT(mfn_to_page(_sl2mfn)->u.sh.type == SH_type_l2_64_shadow ||\ > - mfn_to_page(_sl2mfn)->u.sh.type == SH_type_l2h_64_shadow);\ > + ASSERT_VALID_L2(mfn_to_page(_sl2mfn)->u.sh.type); \ > for ( _i = 0; _i < SHADOW_L2_PAGETABLE_ENTRIES; _i++ ) \ > { \ > if ( (!(_xen)) \ > || !is_pv_32bit_domain(_dom) \ > - || mfn_to_page(_sl2mfn)->u.sh.type != SH_type_l2h_64_shadow \ > + || mfn_to_page(_sl2mfn)->u.sh.type == SH_type_l2_64_shadow \ Isn't this redundant with the ASSERT_VALID_L2() now? Per your other question, yes this desperately wants rearranging, but I would agree with it being another patch. I did previously play at trying to simplify the PV pagetable loops in a similar way. Code-gen wise, I think the L2 loops what to calculate an upper bound which is either 512, or compat_first_slot, while the L4 loops what an "if(i == 256) i += 7; continue;" rather than having LFENCE-ing predicates on each iteration. ~Andrew