On Fri, Jan 5, 2018 at 3:26 AM, Paolo Bonzini <pbonz...@redhat.com> wrote: > On 05/01/2018 11:28, Paul Turner wrote: >> >> The "pause; jmp" sequence proved minutely faster than "lfence;jmp" which is >> why >> it was chosen. >> >> "pause; jmp" 33.231 cycles/call 9.517 ns/call >> "lfence; jmp" 33.354 cycles/call 9.552 ns/call > > Do you have timings for a non-retpolined indirect branch with the > predictor suppressed via IBRS=1? So at least we can compute the break > even point.
The data I collected here previously had the run-time cost as a wash. On Skylake, an IBRS=1 and a retpolined indirect branch had cost within a few cycles. The costs to consider when making a choice here are: - The transition overheads. This is how frequently will you be switching in and out of protected code (as IBRS needs to be enabled and disabled at these boundaries). - The frequency at which you will be executing protected code on one sibling, and unprotected code on another (enabling IBRS may affect sibling execution, depending on SKU) - The implementation cost (retpoline requires auditing/rebuilding your target, while IBRS can be used out of the box). > > Paolo