On Thu, Mar 27, 2025 at 10:59:58AM +0100, Jan Beulich wrote:
> On 27.03.2025 10:50, Roger Pau Monné wrote:
> > On Thu, Mar 27, 2025 at 10:15:03AM +0100, Jan Beulich wrote:
> >> On 27.03.2025 09:21, Roger Pau Monné wrote:
> >>> On Tue, Mar 25, 2025 at 08:18:11AM +0100, Jan Beulich wrote:
> >>>> In particular if we're running virtualized, the underlying hypervisor
> >>>> (which may be another Xen) may not surface MTRRs, and offer PAT only.
> >>>
> >>> At least for Xen, I think we offer MTRR uniformly, even on PVH
> >>> guests?
> >>
> >> By default we do, but we discussed the option of offering PAT-only 
> >> environments
> >> beyond leaving it open whether people disabling MTRR support in their guest
> >> config are outside of supported terrain.
> >>
> >>>> Fixes: 5a281883cdc3 ("Hardcode many cpu features for x86/64 -- we know 
> >>>> 64-bit")
> >>>> Signed-off-by: Jan Beulich <jbeul...@suse.com>
> >>>
> >>> Acked-by: Roger Pau Monné <roger....@citrix.com>
> >>>
> >>> My main concern is whether the !mtrr path is still functional.  Have
> >>> you tried booting the resulting hypervisor with MTRR masked on CPUID?
> >>>
> >>> (or alternatively short-circuiting cpu_has_mtrr == 0?)
> >>
> >> I didn't think this would be a prereq here. If we boot in an environment 
> >> truly
> >> lacking MTRR, we'll crash on the first MTRR MSR access - none of those 
> >> accesses
> >> use the safe accessors.
> > 
> > Right, hopefully we don't have unprotected MTRR MSR accesses, so
> > cpu_has_mtrr == 0 should prevent those.
> 
> Actually we do, see my other patch just posted.

Oh, yes, that was one of my concerns, but I've failed to spot it.

> >> Since you asked, I tried booting with "cpuid=no-mtrr".
> >> As I'm doing this on a system with console, all I can say is that it takes
> >> minutes to reach the point where we'd start setting up Dom0 (which itself 
> >> then
> >> takes so long that I timed out waiting for it to make progress), due to all
> >> screen output becoming unbelievably slow after AP bringup. Surely 
> >> something's
> >> screwed somewhere, as VRAM accesses being slow (or fast) shouldn't depend 
> >> on AP
> >> bringup having completed. I actually suspect it's not just VRAM accesses 
> >> which
> >> are slow, but that we're left running in uncachable mode altogether for 
> >> whatever
> >> reason.
> >>
> >> What this maybe useful for is to figure out the reason of "Platform timer
> >> appears to have unexpectedly wrapped <N> times", which I saw appear once.
> >>
> >> Given this, I'm actually uncertain whether it is legitimate then to take 
> >> your
> >> ack.
> > 
> > I think it might be best if we can figure out what causes those issues
> > (and possibly fix them) before taking this patch?
> > 
> > Albeit you could argue that running excruciatingly slow is better than
> > just crashing of an unhandled #GP from a rdmsr.
> 
> Indeed that's my thinking. But if you prefer, I can wait with this patch until
> after the other one has gone in.

Given the small context of the other patch, you might as well put both
in together.

Thanks, Roger.

Reply via email to