On 14/08/2025 4:57 pm, Jan Beulich wrote:
> On 08.08.2025 22:23, Andrew Cooper wrote:
>> --- a/xen/arch/x86/include/asm/asm_defns.h
>> +++ b/xen/arch/x86/include/asm/asm_defns.h
>> @@ -315,6 +315,71 @@ static always_inline void stac(void)
>>          subq  $-(UREGS_error_code-UREGS_r15+\adj), %rsp
>>  .endm
>>  
>> +/*
>> + * Push and clear GPRs
>> + */
>> +.macro PUSH_AND_CLEAR_GPRS
>> +        push  %rdi
>> +        xor   %edi, %edi
>> +        push  %rsi
>> +        xor   %esi, %esi
>> +        push  %rdx
>> +        xor   %edx, %edx
>> +        push  %rcx
>> +        xor   %ecx, %ecx
>> +        push  %rax
>> +        xor   %eax, %eax
>> +        push  %r8
>> +        xor   %r8d, %r8d
>> +        push  %r9
>> +        xor   %r9d, %r9d
>> +        push  %r10
>> +        xor   %r10d, %r10d
>> +        push  %r11
>> +        xor   %r11d, %r11d
>> +        push  %rbx
>> +        xor   %ebx, %ebx
>> +        push  %rbp
>> +#ifdef CONFIG_FRAME_POINTER
>> +/* Indicate special exception stack frame by inverting the frame pointer. */
>> +        mov   %rsp, %rbp
>> +        notq  %rbp
>> +#else
>> +        xor   %ebp, %ebp
>> +#endif
>> +        push  %r12
>> +        xor   %r12d, %r12d
>> +        push  %r13
>> +        xor   %r13d, %r13d
>> +        push  %r14
>> +        xor   %r14d, %r14d
>> +        push  %r15
>> +        xor   %r15d, %r15d
>> +.endm
>> +
>> +/*
>> + * POP GPRs from a UREGS_* frame on the stack.  Does not modify flags.
>> + *
>> + * @rax: Alternative destination for the %rax value on the stack.
>> + */
>> +.macro POP_GPRS rax=%rax
>> +        pop   %r15
>> +        pop   %r14
>> +        pop   %r13
>> +        pop   %r12
>> +        pop   %rbp
>> +        pop   %rbx
>> +        pop   %r11
>> +        pop   %r10
>> +        pop   %r9
>> +        pop   %r8
>> +        pop   \rax
>> +        pop   %rcx
>> +        pop   %rdx
>> +        pop   %rsi
>> +        pop   %rdi
>> +.endm
> Hmm, yes, differences are apparently large enough to warrant the redundancy
> with SAVE_ALL / RESTORE_ALL.

You may recall this construct from prior attempts I've had to remove
SAVE_ALL/RESTORE_ALL, even with the \rax parameter for SVM.  I still
intend to complete that work at some point.

>
>> --- a/xen/arch/x86/include/asm/msr.h
>> +++ b/xen/arch/x86/include/asm/msr.h
>> @@ -202,9 +202,9 @@ static inline unsigned long read_gs_base(void)
>>  
>>  static inline unsigned long read_gs_shadow(void)
>>  {
>> -    unsigned long base;
>> +    unsigned long base, cr4 = read_cr4();
>>  
>> -    if ( read_cr4() & X86_CR4_FSGSBASE )
>> +    if ( !(cr4 & X86_CR4_FRED) && (cr4 & X86_CR4_FSGSBASE) )
>>      {
>>          asm volatile ( "swapgs" );
>>          base = __rdgsbase();
>> @@ -234,7 +234,9 @@ static inline void write_gs_base(unsigned long base)
>>  
>>  static inline void write_gs_shadow(unsigned long base)
>>  {
>> -    if ( read_cr4() & X86_CR4_FSGSBASE )
>> +    unsigned long cr4 = read_cr4();
>> +
>> +    if ( !(cr4 & X86_CR4_FRED) && (cr4 & X86_CR4_FSGSBASE) )
>>      {
>>          asm volatile ( "swapgs\n\t"
>>                         "wrgsbase %0\n\t"
> I don't quite get how these changes fit into this patch.

Without the change, read_registers() suffers #UD because of the SWAPGS.

This recurses until hitting the guard page, then repeats the same on the
#DF stack.  And because stacks work nicely under FRED, you eventually
hit #DF's guard page.

Strictly speaking it's only read_gs_shadow() which we need to change to
make exception handling work, but I fixed both at the same time.

That said, I have actually cleaned this codepath up with the MSR work
because the code gen in read_registers() is terrible.  Due to
no-strict-aliasing, every store into state-> forces a recalculation of
get_cpu_info(), meaning that read_cr4() cannot be hoisted, and there's a
branch in every helper.

I'm still not sure how best to fit it into this series.

>
>> --- a/xen/arch/x86/traps.c
>> +++ b/xen/arch/x86/traps.c
>> @@ -1013,6 +1013,32 @@ void show_execution_state_nmi(const cpumask_t *mask, 
>> bool show_all)
>>          printk("Non-responding CPUs: {%*pbl}\n", 
>> CPUMASK_PR(&show_state_mask));
>>  }
>>  
>> +static const char *x86_et_name(unsigned int type)
>> +{
>> +    static const char *const names[] = {
>> +        [X86_ET_EXT_INTR]    = "EXT_INTR",
>> +        [X86_ET_NMI]         = "NMI",
>> +        [X86_ET_HW_EXC]      = "HW_EXC",
>> +        [X86_ET_SW_INT]      = "SW_INT",
>> +        [X86_ET_PRIV_SW_EXC] = "PRIV_SW_EXEC",
>> +        [X86_ET_SW_EXC]      = "SW_EXEC",
>> +        [X86_ET_OTHER]       = "OTHER",
>> +    };
>> +
>> +    return (type < ARRAY_SIZE(names) && names[type]) ? names[type] : "???";
>> +}
>> +
>> +static const char *x86_et_other_name(unsigned int vec)
> This isn't really a vector, is it?

Well - you are decoding the field name regs->fred_ss.vector.

Also I see I've typo'd EXEC in two of those names.

>
>> +{
>> +    static const char *const names[] = {
>> +        [0] = "MTF",
>> +        [1] = "SYSCALL",
>> +        [2] = "SYSENTER",
>> +    };
>> +
>> +    return (vec < ARRAY_SIZE(names) && names[vec][0]) ? names[vec] : "???";
> Did you mean to check names[ves] for being NULL? Or is this a leftover
> from the array being something like names[][10]?

Oh, bad copy&paste.  Will fix.

>
>> --- a/xen/arch/x86/x86_64/Makefile
>> +++ b/xen/arch/x86/x86_64/Makefile
>> @@ -1,6 +1,7 @@
>>  obj-$(CONFIG_PV32) += compat/
>>  
>>  obj-bin-y += entry.o
>> +obj-bin-y += entry-fred.o
> For the ordering here, ...
>
>> --- /dev/null
>> +++ b/xen/arch/x86/x86_64/entry-fred.S
>> @@ -0,0 +1,35 @@
>> +/* SPDX-License-Identifier: GPL-2.0-or-later */
>> +
>> +        .file "x86_64/entry-fred.S"
>> +
>> +#include <asm/asm_defns.h>
>> +#include <asm/page.h>
>> +
>> +        .section .text.entry, "ax", @progbits
>> +
>> +        /* The Ring3 entry point is required to be 4k aligned. */
>> +
>> +FUNC(entry_FRED_R3, 4096)
> ... doesn't this 4k-alignment requirement suggest we want to put
> entry-fred.o first?

Perhaps, but that is quite subtle.  I did also consider a
.text.entry.page_aligned section, but .text.entry only matters for XPTI
which (as agreed), I'm not intending to implement in FRED mode unless it
proves to be necessary.

Also IIRC there's still a symbol bug where _sentrytext takes priority
over entry_FRED_R3, so the backtrace is effectively wrong.

(These are all bad excuses, but some parts of this series are rather old.)

>  Also, might it be more natural to use PAGE_SIZE
> here?

I did debate that, but the spec uses 0xfff, not pages, even if the
pipline surely does have an optimisation for chopping 12 metadata bits
off the bottom of a pointer.

~Andrew

Reply via email to