Kumar Kartikeya Dwivedi <mem...@gmail.com> writes: > Back when all of this surfaced, compiler folks came up with another > solution, to rely on Intel's guarantee that conditional moves are not > predicted. > > if (condition) { > mask = !condition ? 0UL : ~0UL; // CMOVcc > ptr &= mask; > x = *ptr; > } > > In case the condition being true in the speculative domain leads to > problems, the speculative domain will just read from NULL and not leak > sensitive data.
Yes, that is an alternative approach. > The assumption is that cost of instrumentation in speculative domain < > completely stalling it until prior instructions are done using lfence. > So speculation is still helpful when the branch is not mispredicted. > Now I imagine it's not fun to do such analysis in the verifier (I've > tried), but theoretically we could break it down into emitting > bytecode from the compiler side, and lifting the compiler to do it for > us, and ensuring the end result produced is sane (by still following > speculative paths) from the verifier. > You talk about this in the paper and in the presentation as future work. > My question is mainly whether you considered implementing this, if > yes, what made you choose a nospec barrier over something like above? The primary motivation was cerainly that it's the easiest to implement with the current verifier design. I mostly decided not to pursue the "verification-only" approach (and insert the insn in LLVM) because it would require changes to the eBPF instruction set. Other consideration include: * The approach could potentially improve performance (the compiler could for example compute the minimal-cut to reduce the number of fences) and simplify the verifier to some extent (no more inserting of insns). * It could have the downside that non/partially-vulnerable architectures can not benefit from improved performance as easily as it is the case with the current design. * The best choice for the instruction-set extension is not clear to me. For Spectre v1 USLH [1] would suffice and then one only needs a cmov, so that's easy. But this does not cover Spectre v4 (which is currently the main source of overhead). It could be 'nospec_vX rY' to tell the verifier that a certain register must not be used under speculation from a specific variant, or something generic/catch-all like the current 'nospec'. * From a security perspective, LLVM SLH is not as strong as the verifier's Spectre v1 mitigation. This is because it does not harden secret-dependent control flow as shown in [1] while the Linux verifier does (where "secrets" are unreadable/uninitialized registers and kernel pointers). It may be the case the this is not a problem for eBPF by conincidence because the verifier also restricts secret-dependent control flow. Without looking into it in detail I am not sure. If one finds that it is a problem, it may also not be important to fix if we adopt the verification-only approach you mention, or one could change LLVM to extend the mitigation. > Was it the complexity of realizing this during verification? > Are there any implications of reading from NULL that would cause problems? In theory yes, in practice I would assume no and that it works out. I am not aware of any documents from Intel / ARM that state that accessing NULL speculatively acts as a speculation barrier (I remember reading some paper that suggested it practically does, but I can not find it now). If it does not (portably), a downside would be that the verifier will have to analyze a lot more speculative instructions. > Also, did you characterize how much difference it could make? [1] has SPEC2017 benchmarks for LLVM-/U-SLH and a naive lfence-based approach (lfence after every branch), for these USLH is about twice as fast (150%) as the naive fence-based approach (300%). But this is only for Spectre v1 and the Spectre v4 overhead would have to be added. Both number are also very high compared to the programs from the VeriFence paper. There the *combined* overhead for Spectre v1 and v4 was 0% for very small programs and 16%-60% for larger programs. I have since also measured the overhead for Katran and there it is 100%-150%. I am currently working on a prototype to reduce the Spectre v4 (and Spectre v1) overhead and for Katran I was able to lower it to 15%-30% by using more precise analysis of the speculative execution using a fence-based approach. Most remaining fence are now still from Spectre v4 (not v1 which would be adressed by SLH) and I hope to eliminate some more using a SLH-style approach for v4. I will of course also have to check how this carries over to other programs, but it certainly seems possible to eliminate almost all fences because there are rarely any 'real' gadgets in non-malicious programs (only false positive one can not eliminate without reasoning about the cache). > The drop in SCTP throughput seems to suggest so, since CPU-bound > computation was moved into the program. > Otherwise most programs mostly defer to helpers for heavy lifting. > Not that it was as fast as a helper would be, even without nospec, but still. > > Also a bit sad we don't split the program into BBs already, which > could help reduce your mitigation's cost + plus also reduce cost of > instruction patching (unrelated). In the prototype I mention I also tried tackling that. However, at least for Katran it was uncommon to have more than one v1-induced fence per basic block. Therefore it might not be worth it. > Anyway, all that said, this is valuable stuff, so I was just curious. [1] https://www.usenix.org/system/files/usenixsecurity23-zhang-zhiyuan-slh.pdf ("Ultimate SLH: Taking Speculative Load Hardening to the Next Level - Section 5.1: Exploiting Secret-Dependent Control Flow")