Hi Andrew,

Dne 25. 02. 25 v 22:14 Andrew Cooper napsal(a):
As stand-in for "the reader", I'll point out that you need to add #DB to
that list or you're in for a rude surprise when running the x86 selftests.

Thanks for pointing this out. I forgot about the interrupt shadow on SYSCALL
and possibly some breakpoints possibilities in the kernel.

The SYSCALL/SYSENTER startup has interrupts disabled, so it is the
problem of NMI/#MC
handler which would need deal with the normal case and attack case.

Right, but in the case of the attack, regular interrupts are most likely
enabled too.  And writing this has just caused me to realise a
yet-more-fun case.
An interrupt hitting the syscall entry path (prior to SWAPGS) will cause
the interrupt handler's CPL check and conditional SWAPGS to do the wrong
thing and switch onto the user GS base too.  (Prior research e.g.
GhostRace has shown how to get an hrtimer to reliably hit an instruction
boundary.)

I don't see it, because if attacker starts at syscall entry and interrupts are enabled and the interrupt happens right there the handler will just see proper IRET frame with %cs of kernel and will not perform swapgs. I will try to think about it again tomorrow I likely missed something.

Interrupts and exceptions look at %cs in the IRET frame to judge whether
to SWAPGS or not (and this is one of the main things that paranoid_entry
does differently).  In the case of the attack, there's no IRET frame
pushed on the stack and the read of %cs is out-of-bounds, most likely
the stack frame of the function which followed the corrupt function pointer.

Thank you for your detailed explanation.

The SYSCALL entrypoint is simply the easiest to pivot on, but all can be
attacked in this manner.  Fixing only the SYSCALL entrypoint doesn't
improve things much.

Maybe more elegant and cheap check on IDT entry "authenticity" would be to check for current %ss which needs to be NULL and possibly check the %CS on stack frame
by checking kernel %cs and not just two CPL bits and/or perform more checks.

Another ideas if you think it is still worth to discuss this topic:

What about to use completely different %CS selector for all entry code? The early entry code would check the %cs selector and panic if it is wrong one.

After swapgs dance, we need to perform far jump to normal kernel %CS, which might cost something.

To fix the interrupt on fake entry problem, we could check in relevant IDT handlers that we never come from "completely different" %CS used above for the early entry code.

And very last idea would be to somehow persuade the Last Branch Recording to record exception entries only and just check it from MSR. But maybe it is too costly and/or not possible.

Thanks,
Rudolf



Reply via email to