On 07/05/2026 3:08 pm, Sean Christopherson wrote:
> On Thu, May 07, 2026, David Woodhouse wrote:
>> From: David Woodhouse <[email protected]>
>>
>> ICEBP (INT1, opcode 0xF1) generates a #DB that is architecturally a
>> trap, but on SVM it was not always intercepted. Unconditionally
>> intercept ICEBP on SVM to match VMX behaviour and ensure correct
>> event delivery semantics.
>>
>> Add two selftests exercising ICEBP:
>>
>>  - int1_ept_test: verifies that ICEBP works correctly when the
>>    exception stack page is not present (EPT/NPT fault during #DB
>>    delivery). The IST stack is evicted via MADV_DONTNEED before
>>    executing INT1.
>>
>>  - int1_task_gate_test: verifies ICEBP delivery through a 32-bit
>>    task gate, exercising the legacy task-switch path for #DB.
>>
>> Tested on Intel Sapphire Rapids and AMD Genoa. Without the SVM fix,
>> int1_task_gate_test fails on AMD with EIP pointing at ICEBP instead
>> of after it. With the fix, both tests pass on both platforms.
> Hmm, but KVM unconditionally intercepts task switches.  Is this effectively 
> working
> around a bug in task_switch_interception()?

Not really.  It's a bug/misfeature in AMD CPUs.

When you get TASK_SWITCH (which always has fault semantics), you look at
the vectoring event type to decide whether it was logically caused by a
trap, and therefore whether to move %rip forwards before entering the
new task.

AMD CPUs don't distinguish instruction-induced #DBs (i.e. ICEBP) from
exception-induced #DBs (all others), and also don't report an
instruction length for an ICEBP-induced TASK_SWITCH.

The workaround is to intercept ICEBP unconditionally, handle the
FAULT->TRAP conversion in the hypervisor, at which point the #DB-induced
TASK_SWITCH occurs with %rip on the correct instruction boundary whether
it was instruction-induced or exception-induced.

~Andrew

Reply via email to