On 28.08.2025 17:04, Andrew Cooper wrote: > ERETU, unlike IRET, requires the sticky-1 bit (bit 2) be set, and reserved > bits to be clear. Notably this means that dom0_construct() must set > X86_EFLAGS_MBS it in order for a PV dom0 to start. > > Adjust arch_set_info_guest*() and hypercall_iret() which consume flags to > clamp the reserved bits. > > This is a minor ABI change, but by the same argument as commit > 9f892f84c279 ("x86/domctl: Stop using XLAT_cpu_user_regs()"), this change will > happen naturally when the vCPU schedules.
It's no that similar, is it? MBS will be observed set once guest context is entered, irrespective of any scheduling. So it's entirely benign if we set it up-front, except of course for a back-to-back set/get. > Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com> > --- > CC: Jan Beulich <jbeul...@suse.com> > CC: Roger Pau Monné <roger....@citrix.com> > > v2: > * New > > The handling of VM is complicated. > > It turns out that it's simply ignored by IRET in Long Mode (i.e. clearing it > commit 0e47f92b0725 ("x86: force EFLAGS.IF on when exiting to PV guests") > wasn't actually necessary) but ERETU does care. > > But, it's unclear how to handle this in in arch_set_info(). We must preserve > it for HVM guests (whih can use vm86 mode). PV32 has special handling but > only in hypercall_iret(), not in arch_set_info(). I think we need to either reject or clear VM, NT, IOPL, and whatever else would make ERETU unhappy (for IOPL we already do so). It simply is of no use to ... > --- a/xen/arch/x86/domain.c > +++ b/xen/arch/x86/domain.c > @@ -1273,7 +1273,7 @@ int arch_set_info_guest( > v->arch.user_regs.rax = c.nat->user_regs.rax; > v->arch.user_regs.rip = c.nat->user_regs.rip; > v->arch.user_regs.cs = c.nat->user_regs.cs; > - v->arch.user_regs.rflags = c.nat->user_regs.rflags; > + v->arch.user_regs.rflags = (c.nat->user_regs.rflags & > X86_EFLAGS_ALL) | X86_EFLAGS_MBS; > v->arch.user_regs.rsp = c.nat->user_regs.rsp; > v->arch.user_regs.ss = c.nat->user_regs.ss; > v->arch.pv.es = c.nat->user_regs.es; > @@ -1297,7 +1297,7 @@ int arch_set_info_guest( > v->arch.user_regs.eax = c.cmp->user_regs.eax; > v->arch.user_regs.eip = c.cmp->user_regs.eip; > v->arch.user_regs.cs = c.cmp->user_regs.cs; > - v->arch.user_regs.eflags = c.cmp->user_regs.eflags; > + v->arch.user_regs.eflags = (c.cmp->user_regs.eflags & > X86_EFLAGS_ALL) | X86_EFLAGS_MBS; > v->arch.user_regs.esp = c.cmp->user_regs.esp; > v->arch.user_regs.ss = c.cmp->user_regs.ss; > v->arch.pv.es = c.cmp->user_regs.es; ... accept the bits here, just for the first exit to guest mode to fault on the ERETU. The guest would have a hard time to recover from that, I expect. Yet perhaps we should do this only conditionally when FRED is active. Then again a VM migrating from a pre-FRED host to a FRED one might observe the (minor) behavioral change later on. > --- a/xen/arch/x86/hvm/domain.c > +++ b/xen/arch/x86/hvm/domain.c > @@ -194,7 +194,7 @@ int arch_set_info_hvm_guest(struct vcpu *v, const struct > vcpu_hvm_context *ctx) > uregs->rsi = regs->esi; > uregs->rdi = regs->edi; > uregs->rip = regs->eip; > - uregs->rflags = regs->eflags; > + uregs->rflags = (regs->eflags & X86_EFLAGS_ALL) | X86_EFLAGS_MBS; > > v->arch.hvm.guest_cr[0] = regs->cr0; > v->arch.hvm.guest_cr[3] = regs->cr3; > @@ -245,7 +245,7 @@ int arch_set_info_hvm_guest(struct vcpu *v, const struct > vcpu_hvm_context *ctx) > uregs->rsi = regs->rsi; > uregs->rdi = regs->rdi; > uregs->rip = regs->rip; > - uregs->rflags = regs->rflags; > + uregs->rflags = (regs->rflags & X86_EFLAGS_ALL) | X86_EFLAGS_MBS; Why would the HVM code need changing at all? We never ERETU there. Jan