On 25/02/2022 08:38, Jan Beulich wrote:
> On 24.02.2022 20:48, Andrew Cooper wrote:
>> The original shadow stack support has an error on S3 resume with very bizzare
>> fallout.  The BSP comes back up, but APs fail with:
>>
>>   (XEN) Enabling non-boot CPUs ...
>>   (XEN) Stuck ??
>>   (XEN) Error bringing CPU1 up: -5
>>
>> and then later (on at least two Intel TigerLake platforms), the next HVM vCPU
>> to be scheduled on the BSP dies with:
>>
>>   (XEN) d1v0 Unexpected vmexit: reason 3
>>   (XEN) domain_crash called from vmx.c:4304
>>   (XEN) Domain 1 (vcpu#0) crashed on cpu#0:
>>
>> The VMExit reason is EXIT_REASON_INIT, which has nothing to do with the
>> scheduled vCPU, and will be addressed in a subsequent patch.  It is a
>> consequence of the APs triple faulting.
>>
>> The reason the APs triple fault is because we don't tear down the stacks on
>> suspend.  The idle/play_dead loop is killed in the middle of running, meaning
>> that the supervisor token is left busy.
>>
>> On resume, SETSSBSY finds the token already busy, suffers #CP and triple
>> faults because the IDT isn't configured this early.
>>
>> Rework the AP bringup path to (re)create the supervisor token.  This ensures
>> the primary stack is non-busy before use.
>>
>> Fixes: b60ab42db2f0 ("x86/shstk: Activate Supervisor Shadow Stacks")
>> Link: https://github.com/QubesOS/qubes-issues/issues/7283
>> Reported-by: Thiner Logoer <logoerthin...@163.com>
>> Reported-by: Marek Marczykowski-Górecki <marma...@invisiblethingslab.com>
>> Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>
>> Tested-by: Thiner Logoer <logoerthin...@163.com>
>> Tested-by: Marek Marczykowski-Górecki <marma...@invisiblethingslab.com>
> Reviewed-by: Jan Beulich <jbeul...@suse.com>

Thanks.

>> Slightly RFC.  This does fix the crash encountered, but it occurs to me that
>> there's a race condition when S3 platform powerdown is incident with an
>> NMI/#MC, where more than just the primary shadow stack can end up busy on
>> resume.
>>
>> A larger fix would be to change how we allocate tokens, and always have each
>> CPU set up its own tokens.  I didn't do this originally in the hopes of 
>> having
>> WRSSQ generally disabled, but that plan failed when encountering reality...
> While I think this wants fixing one way or another, I also think this
> shouldn't block the immediate fix here (which addresses an unconditional
> crash rather than a pretty unlikely one).

Fair point.  I'll get this committed now, and work on the IST shstks later.

As a note for backporting, this depends on the adjustments made in c/s
311434bfc9d1 so isn't safe to backport in exactly this form.  I'll sort
something out in due course.

~Andrew

Reply via email to