On Thu, Jan 04, 2018 at 10:25:35AM -0800, Linus Torvalds wrote: > On Thu, Jan 4, 2018 at 10:17 AM, Alexei Starovoitov > <alexei.starovoi...@gmail.com> wrote: > > > > Clearly Paul's approach to retpoline without lfence is faster. > > I'm guessing it wasn't shared with amazon/intel until now and > > this set of patches going to adopt it, right? > > > > Paul, could you share a link to a set of alternative gcc patches > > that do retpoline similar to llvm diff ? > > What is the alternative approach? Is it literally just doing a > > call 1f > 1: mov real_target,(%rsp) > ret > > on the assumption that the "ret" will always just predict to that "1" > due to the call stack?
Pretty much. Paul's writeup: https://support.google.com/faqs/answer/7625886 tldr: jmp *%r11 gets converted to: call set_up_target; capture_spec: pause; jmp capture_spec; set_up_target: mov %r11, (%rsp); ret; where capture_spec part will be looping speculatively.