On Thu, Feb 13, 2025 at 2:31 AM Andrew Cooper <andrew.coop...@citrix.com> wrote: > >> Assuming this is an issue you all feel is worth addressing, I will > >> continue working on providing a patch. I'm concerned though that the > >> overhead from adding a wrmsr on both syscall entry and exit to > >> overwrite and restore the KERNEL_GS_BASE MSR may be quite high, so > >> any feedback in regards to the approach or suggestions of alternate > >> approaches to patching are welcome :) > > > > Since the kernel, as far as I understand, uses FineIBT without > > backwards control flow protection (in other words, I think we assume > > that the kernel stack is trusted?), > > This is fun indeed. Linux cannot use supervisor shadow stacks because > the mess around NMI re-entrancy (and IST more generally) requires ROP > gadgets in order to function safely. Implementing this with shadow > stacks active, while not impossible, is deemed to be prohibitively > complicated. > > Linux's supervisor shadow stack support is waiting for FRED support, > which fixes both the NMI re-entrancy problem, and other exceptions > nesting within NMIs, as well as prohibiting the use of the SWAPGS > instruction as FRED tries to make sure that the correct GS is always in > context. > > But, FRED support is slated for PantherLake/DiamondRapids which haven't > shipped yet, so are no use to the problem right now. > > > could we build a cheaper > > check on that basis somehow? For example, maybe we could do something like: > > > > ``` > > endbr64 > > test rsp, rsp > > js slowpath > > swapgs > > ``` > > I presume it's been pointed out already, but there are 3 related > entrypoints here. SYSCALL64 which is discussed, SYSCALL32 and SYSENTER > which are related. > > But, any other IDT entry is in a similar bucket. If we're corrupting a > function pointer or return address to redirect here, then the check of > CS(%rsp) to control the conditional SWAPGS is an OoB read in the callers > stack frame. > > For IDT entries, checking %rsp is reasonable, because userspace can't > forge a kernel-like %rsp. However, SYSCALL64 specifically leaves %rsp > entirely attacker controlled (and even potentially non-canonical), so > I'm wondering what you hand in mind for the slowpath to truly > distinguish kernel context from user context?
Hm, yeah, that seems hard - maybe the best we could do is to make sure that the inactive gsbase has the correct value for our CPU's kernel gsbase? Kinda like a paranoid_entry, except more painful because we'd first have to figure out a place to spill registers to before we can start using stuff like rdmsr... Then a function pointer overwrite might still turn into returning to userspace with a sysret with GPRs full of kernel pointers, but at least we wouldn't run off of a bogus gsbase anymore?