On Thu, Jan 04, 2018 at 10:25:35AM -0800, Linus Torvalds wrote: > On Thu, Jan 4, 2018 at 10:17 AM, Alexei Starovoitov > <alexei.starovoi...@gmail.com> wrote: > > > > Clearly Paul's approach to retpoline without lfence is faster.
Using pause rather than lfence does not represent a fundamental difference here. A protected indirect branch is always adding ~25-30 cycles of overhead. That this can be avoided in practice is a function of two key factors: (1) Kernel code uses fewer indirect branches. (2) The overhead can be avoided for hot indirect branches via devirtualization. e.g. the semantic equivalent of, if (ptr == foo) foo(); else (*ptr)(); Allowing foo() to be called directly, even though it was provided as an indirect. > > I'm guessing it wasn't shared with amazon/intel until now and > > this set of patches going to adopt it, right? > > > > Paul, could you share a link to a set of alternative gcc patches > > that do retpoline similar to llvm diff ? > > What is the alternative approach? Is it literally just doing a > > call 1f > 1: mov real_target,(%rsp) > ret > > on the assumption that the "ret" will always just predict to that "1" > due to the call stack? > > Linus