On 02/10/2020 16:39, Jan Beulich wrote:
> On 02.10.2020 17:08, Marek Marczykowski-Górecki wrote:
>> I've done another bisect on the commit broken up in separate changes
>> (https://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/dbg-s3)
>> and the bad part seems to be this:
>>
>> From dbdb32f8c265295d6af7cd4cd0aa12b6d04a0430 Mon Sep 17 00:00:00 2001
>> From: Andrew Cooper <andrew.coop...@citrix.com>
>> Date: Fri, 2 Oct 2020 15:40:22 +0100
>> Subject: [PATCH 1/1] CR4
> Interesting - I was wild guessing so yesterday, but couldn't come
> up with even a vague reason why this would be. I think you could
> further split it up:
>
>> --- a/xen/arch/x86/acpi/power.c
>> +++ b/xen/arch/x86/acpi/power.c
>> @@ -195,7 +195,6 @@ static int enter_state(u32 state)
>>      unsigned long flags;
>>      int error;
>>      struct cpu_info *ci;
>> -    unsigned long cr4;
>>  
>>      if ( (state <= ACPI_STATE_S0) || (state > ACPI_S_STATES_MAX) )
>>          return -EINVAL;
>> @@ -270,15 +269,15 @@ static int enter_state(u32 state)
>>  
>>      system_state = SYS_STATE_resume;
>>  
>> -    /* Restore CR4 and EFER from cached values. */
>> -    cr4 = read_cr4();
>> -    write_cr4(cr4 & ~X86_CR4_MCE);
>> +    /* Restore EFER from cached value. */
>>      write_efer(read_efer());
> This one should be possible to leave in place despite ...
>
>>      device_power_up(SAVED_ALL);
>>  
>>      mcheck_init(&boot_cpu_data, false);
>> -    write_cr4(cr4);
>> +
>> +    /* Restore CR4 from cached value, now MCE is set up. */
>> +    write_cr4(read_cr4());
> ... this change.
>
> Further, while I can't see how the set_in_cr4() in mcheck_init()
> could badly interact with the CR4 writes here, another option
> might be to suppress it when system_state == SYS_STATE_resume
> && c == &boot_cpu_data (or !bsp && c == &boot_cpu_data).
>
>> --- a/xen/arch/x86/acpi/suspend.c
>> +++ b/xen/arch/x86/acpi/suspend.c
>> @@ -23,7 +23,4 @@ void save_rest_processor_state(void)
>>  void restore_rest_processor_state(void)
>>  {
>>      load_system_tables();
>> -
>> -    /* Restore full CR4 (inc MCE) now that the IDT is in place. */
>> -    write_cr4(mmu_cr4_features);
>>  }
> This one should be possible to leave in place despite the other
> changes.

We're continuing to debug in private.  mmu_cr4_features and read_cr4()
are equivalent (as expected), but very different from MINIMAL_CR4 which
is what the trampoline configures, so I think we're suffering a
CR4-related #UD/#GP somewhere in device_power_up() or mcheck_init().

Its not INVPCID.  Trying others.

~Andrew

Reply via email to