On 07/05/2026 3:08 pm, Sean Christopherson wrote: > On Thu, May 07, 2026, David Woodhouse wrote: >> From: David Woodhouse <[email protected]> >> >> ICEBP (INT1, opcode 0xF1) generates a #DB that is architecturally a >> trap, but on SVM it was not always intercepted. Unconditionally >> intercept ICEBP on SVM to match VMX behaviour and ensure correct >> event delivery semantics. >> >> Add two selftests exercising ICEBP: >> >> - int1_ept_test: verifies that ICEBP works correctly when the >> exception stack page is not present (EPT/NPT fault during #DB >> delivery). The IST stack is evicted via MADV_DONTNEED before >> executing INT1. >> >> - int1_task_gate_test: verifies ICEBP delivery through a 32-bit >> task gate, exercising the legacy task-switch path for #DB. >> >> Tested on Intel Sapphire Rapids and AMD Genoa. Without the SVM fix, >> int1_task_gate_test fails on AMD with EIP pointing at ICEBP instead >> of after it. With the fix, both tests pass on both platforms. > Hmm, but KVM unconditionally intercepts task switches. Is this effectively > working > around a bug in task_switch_interception()?
Not really. It's a bug/misfeature in AMD CPUs. When you get TASK_SWITCH (which always has fault semantics), you look at the vectoring event type to decide whether it was logically caused by a trap, and therefore whether to move %rip forwards before entering the new task. AMD CPUs don't distinguish instruction-induced #DBs (i.e. ICEBP) from exception-induced #DBs (all others), and also don't report an instruction length for an ICEBP-induced TASK_SWITCH. The workaround is to intercept ICEBP unconditionally, handle the FAULT->TRAP conversion in the hypervisor, at which point the #DB-induced TASK_SWITCH occurs with %rip on the correct instruction boundary whether it was instruction-induced or exception-induced. ~Andrew

