pv: Adjust eflags handling for FRED mode

Jan Beulich Mon, 01 Sep 2025 08:37:03 -0700

On 28.08.2025 17:04, Andrew Cooper wrote:
> ERETU, unlike IRET, requires the sticky-1 bit (bit 2) be set, and reserved
> bits to be clear.  Notably this means that dom0_construct() must set
> X86_EFLAGS_MBS it in order for a PV dom0 to start.
> 
> Adjust arch_set_info_guest*() and hypercall_iret() which consume flags to
> clamp the reserved bits.
> 
> This is a minor ABI change, but by the same argument as commit
> 9f892f84c279 ("x86/domctl: Stop using XLAT_cpu_user_regs()"), this change will
> happen naturally when the vCPU schedules.


It's no that similar, is it? MBS will be observed set once guest context is
entered, irrespective of any scheduling. So it's entirely benign if we set
it up-front, except of course for a back-to-back set/get.

> Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>
> ---
> CC: Jan Beulich <jbeul...@suse.com>
> CC: Roger Pau Monné <roger....@citrix.com>
> 
> v2:
>  * New
> 
> The handling of VM is complicated.
> 
> It turns out that it's simply ignored by IRET in Long Mode (i.e. clearing it
> commit 0e47f92b0725 ("x86: force EFLAGS.IF on when exiting to PV guests")
> wasn't actually necessary) but ERETU does care.
> 
> But, it's unclear how to handle this in in arch_set_info().  We must preserve
> it for HVM guests (whih can use vm86 mode).  PV32 has special handling but
> only in hypercall_iret(), not in arch_set_info().

I think we need to either reject or clear VM, NT, IOPL, and whatever else
would make ERETU unhappy (for IOPL we already do so). It simply is of no
use to ...

> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -1273,7 +1273,7 @@ int arch_set_info_guest(
>          v->arch.user_regs.rax               = c.nat->user_regs.rax;
>          v->arch.user_regs.rip               = c.nat->user_regs.rip;
>          v->arch.user_regs.cs                = c.nat->user_regs.cs;
> -        v->arch.user_regs.rflags            = c.nat->user_regs.rflags;
> +        v->arch.user_regs.rflags            = (c.nat->user_regs.rflags & 
> X86_EFLAGS_ALL) | X86_EFLAGS_MBS;
>          v->arch.user_regs.rsp               = c.nat->user_regs.rsp;
>          v->arch.user_regs.ss                = c.nat->user_regs.ss;
>          v->arch.pv.es                       = c.nat->user_regs.es;
> @@ -1297,7 +1297,7 @@ int arch_set_info_guest(
>          v->arch.user_regs.eax               = c.cmp->user_regs.eax;
>          v->arch.user_regs.eip               = c.cmp->user_regs.eip;
>          v->arch.user_regs.cs                = c.cmp->user_regs.cs;
> -        v->arch.user_regs.eflags            = c.cmp->user_regs.eflags;
> +        v->arch.user_regs.eflags            = (c.cmp->user_regs.eflags & 
> X86_EFLAGS_ALL) | X86_EFLAGS_MBS;
>          v->arch.user_regs.esp               = c.cmp->user_regs.esp;
>          v->arch.user_regs.ss                = c.cmp->user_regs.ss;
>          v->arch.pv.es                       = c.cmp->user_regs.es;

... accept the bits here, just for the first exit to guest mode to fault on
the ERETU. The guest would have a hard time to recover from that, I expect.
Yet perhaps we should do this only conditionally when FRED is active. Then
again a VM migrating from a pre-FRED host to a FRED one might observe the
(minor) behavioral change later on.

> --- a/xen/arch/x86/hvm/domain.c
> +++ b/xen/arch/x86/hvm/domain.c
> @@ -194,7 +194,7 @@ int arch_set_info_hvm_guest(struct vcpu *v, const struct 
> vcpu_hvm_context *ctx)
>          uregs->rsi    = regs->esi;
>          uregs->rdi    = regs->edi;
>          uregs->rip    = regs->eip;
> -        uregs->rflags = regs->eflags;
> +        uregs->rflags = (regs->eflags & X86_EFLAGS_ALL) | X86_EFLAGS_MBS;
>  
>          v->arch.hvm.guest_cr[0] = regs->cr0;
>          v->arch.hvm.guest_cr[3] = regs->cr3;
> @@ -245,7 +245,7 @@ int arch_set_info_hvm_guest(struct vcpu *v, const struct 
> vcpu_hvm_context *ctx)
>          uregs->rsi    = regs->rsi;
>          uregs->rdi    = regs->rdi;
>          uregs->rip    = regs->rip;
> -        uregs->rflags = regs->rflags;
> +        uregs->rflags = (regs->rflags & X86_EFLAGS_ALL) | X86_EFLAGS_MBS;

Why would the HVM code need changing at all? We never ERETU there.

Jan

Re: [PATCH v2 23/23] x86/pv: Adjust eflags handling for FRED mode

Reply via email to