On Fri, Aug 23, 2019 at 11:07 PM Nadav Amit <na...@vmware.com> wrote: > > INVPCID is considerably slower than INVLPG of a single PTE, but it is > currently used to flush PTEs in the user page-table when PTI is used. > > Instead, it is possible to defer TLB flushes until after the user > page-tables are loaded. Preventing speculation over the TLB flushes > should keep the whole thing safe. In some cases, deferring TLB flushes > in such a way can result in more full TLB flushes, but arguably this > behavior is oftentimes beneficial.
I have a somewhat horrible suggestion. Would it make sense to refactor this so that it works for user *and* kernel tables? In particular, if we flush a *kernel* mapping (vfree, vunmap, set_memory_ro, etc), we shouldn't need to send an IPI to a task that is running user code to flush most kernel mappings or even to free kernel pagetables. The same trick could be done if we treat idle like user mode for this purpose. In code, this could mostly consist of changing all the "user" data structures involved to something like struct deferred_flush_info and having one for user and one for kernel. I think this is horrible because it will enable certain workloads to work considerably faster with PTI on than with PTI off, and that would be a barely excusable moral failing. :-p For what it's worth, other than register clobber issues, the whole "switch CR3 for PTI" logic ought to be doable in C. I don't know a priori whether that would end up being an improvement.