On Sat, 2018-01-13 at 08:09 -0800, H.J. Lu wrote: > > > Again please extend both documentation hunks so it is clear what is purpose > > of this hack. > > David, can you help here?
On most older CPUs the indirect branch issue is limited to actual indirect branches. On Skylake-era CPUs, however, an underflow of the RSB (return stack buffer) caused by a call/ret imbalance (such as on context switch) will cause predictions to come from the same problematic branch predictor — essentially, allowing 'ret' instructions to be targeted by an attacker in precisely the same way as indirect branches. Note that there are plenty of other causes for RSB underflow. Like taking an SMI, which clears the RSB completely. Or various other things. Including a call stack deeper than 16 function calls. The -mfunction-return option was an experiment to use the retpoline approach for 'ret' too. I forget the implementation (I could look upthread), but essentially it was equivalent to replacing ret with 'pop %r12; jmp __x86_indirect_thunk_r12' so that you *never* deplete the RSB because of the 'call;ret' trick in the retpoline itself. Hence your exposure on Skylake was reduced to the possibility of taking an SMI while *in* the retpoline. This would, of course, be forcing a mispredict/pipeline stall on every 'ret', rather than only on every indirect branch as in the original retpoline idea. HJ added the code, but I'm not sure anyone at Intel ever did actually do the *testing* to establish the performance characteristics. Dave/Arjan? For my part, right *now* the kernel doesn't use this option. But then, we don't have a comprehensive answer for Skylake yet other than "use the new microcode features". Which are slower than retpoline, but not as *much* slower on Skylake as they are on other CPUs.
smime.p7s
Description: S/MIME cryptographic signature