On 27.03.2025 10:50, Roger Pau Monné wrote:
> On Thu, Mar 27, 2025 at 10:15:03AM +0100, Jan Beulich wrote:
>> On 27.03.2025 09:21, Roger Pau Monné wrote:
>>> On Tue, Mar 25, 2025 at 08:18:11AM +0100, Jan Beulich wrote:
>>>> In particular if we're running virtualized, the underlying hypervisor
>>>> (which may be another Xen) may not surface MTRRs, and offer PAT only.
>>>
>>> At least for Xen, I think we offer MTRR uniformly, even on PVH
>>> guests?
>>
>> By default we do, but we discussed the option of offering PAT-only 
>> environments
>> beyond leaving it open whether people disabling MTRR support in their guest
>> config are outside of supported terrain.
>>
>>>> Fixes: 5a281883cdc3 ("Hardcode many cpu features for x86/64 -- we know 
>>>> 64-bit")
>>>> Signed-off-by: Jan Beulich <jbeul...@suse.com>
>>>
>>> Acked-by: Roger Pau Monné <roger....@citrix.com>
>>>
>>> My main concern is whether the !mtrr path is still functional.  Have
>>> you tried booting the resulting hypervisor with MTRR masked on CPUID?
>>>
>>> (or alternatively short-circuiting cpu_has_mtrr == 0?)
>>
>> I didn't think this would be a prereq here. If we boot in an environment 
>> truly
>> lacking MTRR, we'll crash on the first MTRR MSR access - none of those 
>> accesses
>> use the safe accessors.
> 
> Right, hopefully we don't have unprotected MTRR MSR accesses, so
> cpu_has_mtrr == 0 should prevent those.

Actually we do, see my other patch just posted.

>> Since you asked, I tried booting with "cpuid=no-mtrr".
>> As I'm doing this on a system with console, all I can say is that it takes
>> minutes to reach the point where we'd start setting up Dom0 (which itself 
>> then
>> takes so long that I timed out waiting for it to make progress), due to all
>> screen output becoming unbelievably slow after AP bringup. Surely something's
>> screwed somewhere, as VRAM accesses being slow (or fast) shouldn't depend on 
>> AP
>> bringup having completed. I actually suspect it's not just VRAM accesses 
>> which
>> are slow, but that we're left running in uncachable mode altogether for 
>> whatever
>> reason.
>>
>> What this maybe useful for is to figure out the reason of "Platform timer
>> appears to have unexpectedly wrapped <N> times", which I saw appear once.
>>
>> Given this, I'm actually uncertain whether it is legitimate then to take your
>> ack.
> 
> I think it might be best if we can figure out what causes those issues
> (and possibly fix them) before taking this patch?
> 
> Albeit you could argue that running excruciatingly slow is better than
> just crashing of an unhandled #GP from a rdmsr.

Indeed that's my thinking. But if you prefer, I can wait with this patch until
after the other one has gone in.

Jan

Reply via email to