On Sat, 2017-09-09 at 12:28 -0700, Andy Lutomirski wrote: > - > I propose the following fix. If PCID is on, then, in > enter_lazy_tlb(), we switch to init_mm with the no-flush flag set. > (And we give init_mm its own dedicated ASID to keep it simple and > fast > -- no need to use the LRU ASID mapping to assign one > dynamically.) We > clear the bit in mm_cpumask. That is, we more or less just skip the > whole lazy TLB optimization and rely on PCID CPUs having reasonably > fast CR3 writes. No extra IPIs.
Avoiding the IPIs is probably what matters the most, especially on systems with deep C states, and virtual machines where the host may be running something else, causing the IPI service time to go through the roof for idle VCPUs. > Also, sorry Rik, this means your old increased laziness optimization > is dead in the water. It will have exactly the same speculative load > problem. Doesn't a memory barrier solve that speculative load problem? The memory barrier could be added only to the path that potentially skips reloading the TLB, under the assumption that a memory barrier is cheaper than a TLB reload (even with ASID).