+AMD folks
On Thu, May 07, 2026, Sean Christopherson wrote:
> On Thu, May 07, 2026, Andrew Cooper wrote:
> > On 07/05/2026 3:08 pm, Sean Christopherson wrote:
> > > On Thu, May 07, 2026, David Woodhouse wrote:
> > >> From: David Woodhouse <[email protected]>
> > >>
> > >> ICEBP (INT1, opcode 0xF1) generates a #DB that is architecturally a
> > >> trap, but on SVM it was not always intercepted. Unconditionally
> > >> intercept ICEBP on SVM to match VMX behaviour and ensure correct
> > >> event delivery semantics.
> > >>
> > >> Add two selftests exercising ICEBP:
> > >>
> > >> - int1_ept_test: verifies that ICEBP works correctly when the
> > >> exception stack page is not present (EPT/NPT fault during #DB
> > >> delivery). The IST stack is evicted via MADV_DONTNEED before
> > >> executing INT1.
> > >>
> > >> - int1_task_gate_test: verifies ICEBP delivery through a 32-bit
> > >> task gate, exercising the legacy task-switch path for #DB.
> > >>
> > >> Tested on Intel Sapphire Rapids and AMD Genoa. Without the SVM fix,
> > >> int1_task_gate_test fails on AMD with EIP pointing at ICEBP instead
> > >> of after it. With the fix, both tests pass on both platforms.
> > > Hmm, but KVM unconditionally intercepts task switches. Is this
> > > effectively working
> > > around a bug in task_switch_interception()?
> >
> > Not really. It's a bug/misfeature in AMD CPUs.
> >
> > When you get TASK_SWITCH (which always has fault semantics), you look at
> > the vectoring event type to decide whether it was logically caused by a
> > trap, and therefore whether to move %rip forwards before entering the
> > new task.
> >
> > AMD CPUs don't distinguish instruction-induced #DBs (i.e. ICEBP) from
> > exception-induced #DBs (all others), and also don't report an
> > instruction length for an ICEBP-induced TASK_SWITCH.
>
> Heh, that explains why I couldn't find an equivalent of
> INTR_TYPE_PRIV_SW_EXCEPTION
> in the SVM code.
Dragging in a comment/concern Andrew raised offlist. If AMD doesn't provide or
*allow* the equivalent of INTR_TYPE_PRIV_SW_EXCEPTION, i.e. type 5, then what
happens when KVM needs to inject an INT1 #DB with FRED enabled? Per Intel's
FRED
spec, which presumably AMD is following, the event type is shoved onto the
stack:
— For INT1, the event stack level is IA32_FRED_STKLVLS[3:2]. The event type
is
5 (privileged software exception) and the vector is 1.
But if SVM doesn't support SVM_EVTINJ_TYPE_INT1, then realistically this can't
work (no way in hell is KVM going to emulate FRED event delivery). Does FRED on
AMD even do the right thing for INT1 without SVM?
> > The workaround is to intercept ICEBP unconditionally, handle the
> > FAULT->TRAP conversion in the hypervisor, at which point the #DB-induced
> > TASK_SWITCH occurs with %rip on the correct instruction boundary whether
> > it was instruction-induced or exception-induced.
> >
> > ~Andrew