On 17.11.2023 10:18, James Dingwall wrote:
> On Thu, Nov 16, 2023 at 04:32:47PM +0000, Andrew Cooper wrote:
>> On 16/11/2023 4:15 pm, James Dingwall wrote:
>>> Hi,
>>>
>>> Per the msr_relaxed documentation:
>>>
>>>    "If using this option is necessary to fix an issue, please report a bug."
>>>
>>> After recently upgrading an environment from Xen 4.14.5 to Xen 4.15.5 we
>>> started experiencing a BSOD at boot with one of our Windows guests.  We 
>>> found
>>> that enabling `msr_relaxed = 1` in the guest configuration has resolved the
>>> problem.  With a debug build of Xen and `hvm_debug=2048` on the command line
>>> the following messages were caught as the BSOD happened:
>>>
>>> (XEN) [HVM:11.0] <vmx_msr_read_intercept> ecx=0x1a2
>>> (XEN) vmx.c:3298:d11v0 RDMSR 0x000001a2 unimplemented
>>> (XEN) d11v0 VIRIDIAN CRASH: 1e ffffffffc0000096 fffff80b8de81eb5 0 0
>>>
>>> I found that MSR 0x1a2 is MSR_TEMPERATURE_TARGET and from that this patch
>>> series from last month:
>>>
>>> https://patchwork.kernel.org/project/xen-devel/list/?series=796550
>>>
>>> Picking out just a small part of that fixes the problem for us. Although the
>>> the patch is against 4.15.5 I think it would be relevant to more recent
>>> releases too.
>>
>> Which version of Windows, and what hardware?
>>
>> The Viridian Crash isn't about the RDMSR itself - it's presumably
>> collateral damage shortly thereafter.
>>
>> Does filling in 0 for that MSR also resolve the issue?  It's model
>> specific and we absolutely cannot pass it through from real hardware
>> like that.
>>
> 
> Hi Andrew,
> 
> Thanks for your response.  The guest is running Windows 10 and the crash
> happens in a proprietary hardware driver.  A little bit of knowledge as
> they say was enough to stop the crash but I don't understand the impact
> of what I've actually done...
> 
> To rework the patch I'd need a bit of guidance, if I understand your
> suggestion I set the MSR to 0 with this change in emul-priv-op.c:

For the purpose of the experiment suggested by Andrew ...

> diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
> index ed97b1d6fcc..66f5e417df6 100644
> --- a/xen/arch/x86/pv/emul-priv-op.c
> +++ b/xen/arch/x86/pv/emul-priv-op.c
> @@ -976,6 +976,10 @@ static int read_msr(unsigned int reg, uint64_t *val,
>          *val = 0;
>          return X86EMUL_OKAY;
>  
> +    case MSR_TEMPERATURE_TARGET:
> +        *val = 0;
> +        return X86EMUL_OKAY;
> +
>      case MSR_P6_PERFCTR(0) ... MSR_P6_PERFCTR(7):
>      case MSR_P6_EVNTSEL(0) ... MSR_P6_EVNTSEL(3):
>      case MSR_CORE_PERF_FIXED_CTR0 ... MSR_CORE_PERF_FIXED_CTR2:

... you wouldn't need this (affects PV domains only), and ...

> and this in vmx.c:
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 54023a92587..bbf37b7f272 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -3259,6 +3259,11 @@ static int vmx_msr_read_intercept(unsigned int msr, 
> uint64_t *msr_content)
>          if ( !nvmx_msr_read_intercept(msr, msr_content) )
>              goto gp_fault;
>          break;
> +
> +    case MSR_TEMPERATURE_TARGET:
> +        *msr_content = 0;
> +        break;
> +
>      case MSR_IA32_MISC_ENABLE:
>          rdmsrl(MSR_IA32_MISC_ENABLE, *msr_content);
>          /* Debug Trace Store is not supported. */

... indeed this ought to do. An eventual real patch may want to look
different, though.

Jan

Reply via email to