On Wed, Feb 15, 2023 at 03:54:11PM +0100, Jan Beulich wrote: > While the PAE-extended-CR3 VM assist is a 32-bit only concept, it still > applies to guests also when run on a 64-bit hypervisor: The "extended > CR3" format has to be used there as well, to fit the address in the only > 32-bit wide register there. As a result it was a mistake that the check > was never enabled for that case, and was then mistakenly deleted in the > course of removal of 32-bit-Xen code (218adf199e68 ["x86: We can assume > CONFIG_PAGING_LEVELS==4"]). > > Similarly during Dom0 construction kernel awareness needs to be taken > into account, and respective code was again mistakenly never enabled for > 32-bit Dom0 when running on 64-bit Xen (and thus wrongly deleted by > 5d1181a5ea5e ["xen: Remove x86_32 build target"]). > > At the same time restrict enabling of the assist for Dom0 to just the > 32-bit case. Furthermore there's no need for an atomic update there. > > Signed-off-by: Jan Beulich <jbeul...@suse.com> > --- > I was uncertain whether to add a check to the CR3 guest read path, > raising e.g. #GP(0) when the value read wouldn't fit but also may not > be converted to "extended" format (overflow is possible there in > principle because of the control tools "slack" in promote_l3_table()). > > In that context I was puzzled to find no check on the CR3 guest write > path even in 4.2: A guest (bogusly) setting the PCD or PWT bits (or any > of the low reserved ones) could observe anomalous behavior rather than > plain failure. > > As to a Fixes: tag - it's pretty unclear which of the many original > 32-on-64 changes to blame. I don't think the two cited commits should > be referenced there, as they didn't break anything that wasn't already > broken. > > --- a/xen/arch/x86/mm.c > +++ b/xen/arch/x86/mm.c > @@ -1520,6 +1520,23 @@ static int promote_l3_table(struct page_ > unsigned int partial_flags = page->partial_flags; > l3_pgentry_t l3e = l3e_empty(); > > + /* > + * PAE pgdirs above 4GB are unacceptable if a 32-bit guest does not > + * understand the weird 'extended cr3' format for dealing with high-order > + * address bits. We cut some slack for control tools (before vcpu0 is > + * initialised).
Don't we then need some check in the vCPU init path to assure that the cr3 is < 32bits if we allow those to initially be set? Or will the initialization unconditionally overwrite any previous cr3 value? Thanks, Roger.