On 06.01.2025 17:06, Andrew Cooper wrote:
> On 06/01/2025 2:28 pm, Jan Beulich wrote:
>> On 03.01.2025 21:47, Andrew Cooper wrote:
>>> +There are several uses of the vm86 fields in Xen:
>>> +
>>> + #. ``struct vcpu`` embeds a ``struct cpu_user_regs`` to hold GPRs/etc when
>>> +    the vCPU is scheduled out.  The vm86 fields are used by the PV logic 
>>> only
>>> +    (``{save,load}_segments()``) and can be moved into separate fields in
>>> +    ``struct pv_vcpu``.  PV's ``dom0_construct()`` sets these fields 
>>> directly,
>>> +    and needs a matching adjustment.
>>> +
>>> + #. As part of ``arch_{get,set}_info_guest()`` during hypercalls.  The
>>> +    guest side needs to remain as-is, but the Xen side can rearranged to 
>>> use
>>> +    the new fields from above.
>>> +
>>> + #. As part of vCPU diagnostics (``show_registers()`` etc).  The ``#DF`` 
>>> path
>>> +    uses the fields as scratch storage for the current register values, 
>>> while
>>> +    the other diagnostics are simply accessing the state of a scheduled-out
>>> +    vCPU.
>> Unlike for the former 2 points and for the one immediately below, but like 
>> for
>> the final one below you don't mention what you intend to do. Here I assume 
>> it'll
>> be reasonably straightforward to use scratch space elsewhere.
> 
> https://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/xen-fred
> is my working branch if you want to peek at things.
> 
> The diagnostics are handled by:
> 
> 1) "x86/traps: Rework register state printing to use an extra_state struct"
> 2) "x86/traps: Avoid OoB accesses to print the data selectors"

Doesn't this one remove the sole caller of read_sregs(), hence wanting to also
purge the function itself? Apart from this ...

> 3) "Revert "x86/traps: 'Fix' safety of read_registers() in #DF path""

... these all look fine to me; I'll wait with a formal R-b though until the
actual submission.

> Something else you might want to proactively look at.  "x86/idt:
> Generate bsp_idt[] at build time".  I figured out how to construct the
> IDT at build time, without using an external tool to format the table,
> and only some slightly disgusting linker script hackery.

Clever.

>>> +Stack layout
>>> +""""""""""""
>>> +
>>> +Xen's CPU stacks are 8-page (8-page aligned), arranged as::
>>> +
>>> +  7 - Primary stack (with a struct cpu_info at the top)
>>> +  6 - Primary stack
>>> +  5 - Primary Shadow Stack (read-only)
>>> +  4 - #DF IST stack
>>> +  3 - #DB IST stack
>>> +  2 - NMI IST stack
>>> +  1 - #MC IST stack
>>> +  0 - IST Shadow Stacks (4x 1k, read-only)
>>> +
>>> +which needs mapping into FREDs Stack Levels.
>>> +
>>> +FRED Stack Levels replace IST.  Most events from Ring3 enter Ring0 at SL0,
>>> +including interrupts, and even exceptions with a non-zero Stack Level
>>> +configured.  Nested exceptions originate from Ring0 even if they were 
>>> trying
>>> +to push a Ring3 event frame onto the stack, so do follow the Ring0 CSL 
>>> rules.
>>> +
>>> +Within Ring0, a stack switch occurs on event delivery if the event has a
>>> +higher configured Stack Level (exceptions in ``MSR_FRED_STK_LVLS``, 
>>> interrupts
>>> +in ``MSR_FRED_CONFIG``).  Otherwise, the new event is delivered on the 
>>> current
>>> +stack.
>>> +
>>> +Under FRED, most sources of ``#DF`` are gone; failure to push a new event
>>> +frame onto a stack is the main remaining one, so ``#DF`` needs to be the
>>> +highest stack level (SL3) to catch errors at all other stack levels.
>>> +
>>> +Also, FRED removes the "syscall gap", removing the primary need for 
>>> ``NMI``,
>>> +``#DB`` and ``#MC`` to need separate stacks.
>>> +
>>> +Therefore, Xen has no need for SL1 or SL2.  Under IDT delivery, we poison 
>>> the
>>> +unused stack pointers with a non-canonical address, but we cannot do that
>>> +under FRED; they're held in MSRs and checked at WRMSR time.  Instead, we 
>>> can
>>> +point the SL pairs (RSP + SSP) at each others (regular and shadow stack) 
>>> guard
>>> +pages such that any use of an unused SL will escalate to ``#DF``.
>> I may have a language issue here: "each others" reads wrong to me; do you 
>> perhaps
>> mean "each ones"?
> 
> It's poor grammar, but not wrong per say.  I'll try to find a different
> way to phrase it.
> 
>>
>> Further, mainly to double check my own understanding: Almost half of the 
>> stack
>> area then isn't used anymore when FRED is active: 2 pages for the primary 
>> stack,
>> 1 page for the primary shadow stack, 1 page for the SL3 stack, and (somewhat
>> wastefully) 1 more for the SL3 shadow stack.
> 
> This matches my understanding (on the proviso that I've still not wired
> up the stack handling yet.  Still cleaning up the initialisation paths.)
> 
>>  There'll be 3 unused pages, and
>> hence space again to have true guard pages, e.g.
>>
>>   7 - Primary stack (with a struct cpu_info at the top)
>>   6 - Primary stack
>>   5 - Guard page
>>   4 - Primary Shadow Stack (read-only)
>>   3 - Guard page
>>   2 - #DF stack
>>   1 - #DF Shadow Stack (read-only)
>>   0 - Guard page
> 
> Shadow stack frames are perfectly good guard pages for regular stacks,
> and vice versa.  That's why I set them up as shadow stack pages even
> when CET isn't active.

"Perfectly valid" isn't quite true: These pages being readable means
writes below the stack bottom (likely the prevailing kind of problem)
are detected, but reads wouldn't be.

> And yes, we could rearrange the stack.  But, there's also a good reason
> not to.  Our code has to cope with both IDT and FRED layouts, which is
> much easier if they're the same.

I certainly can see the value of keeping stack layout uniform.

>> Having reached the bottom of the section, there's one special IST aspect that
>> I'm missing, special enough imo to warrant mentioning even if only to state 
>> that
>> it's (hopefully) going to be irrelevant (i.e. not require replacing by 
>> another
>> similar hack): {dis,en}able_each_ist() as used by SVM (on the assumption that
>> sooner or later AMD is likely to also implement FRED, and that you may 
>> already
>> know of details of their respective VMCB integration).
> 
> AMD haven't said anything about FRED yet, despite a very large number of
> software partners asking about it.
> 
> However, If AMD were to do FRED, I'd expect it to work just like CET
> does today, seeing as there's a proper host/guest split of CR4, and
> everything else is in MSRs the guest can't touch.

As in "can be prevented to touch directly", aiui.

Jan

Reply via email to