On 02/10/2019 10:50, Wieczorkiewicz, Pawel wrote: >>>>> I've taken, as a simple example, >>>>> p2m_mem_access_sanity_check(), and this looks to be the way gcc9 has >>>>> generated >>>>> code (in a non-debug build). Hence either I'm mis-remembering what we want >>>>> things to look like, or there's more to it than code generation simply >>>>> being >>>>> "not correct". >>>> This example demonstrates the problem, and actually throws a further >>>> spanner in the works of how make this safe, which hadn't occurred to me >>>> before. >>>> >>>> The instruction stream from a caller of p2m_mem_access_sanity_check() >>>> will look something like this: >>>> >>>> call p2m_mem_access_sanity_check >>>> ... >>>> lfence >>>> ... >>>> ret >>>> cmp $0, %eax >>>> jne ... >>>> >>>> Which is unsafe, because the only safe way to arrange this code would be: >>>> >>>> call p2m_mem_access_sanity_check >>>> ... >>>> ret >>>> cmp $0, %eax >>>> jne 1f >>>> lfence >>>> ... >>>> 1: lfence >>>> … >>>>
Answering out of order, because I think this will make things clearer. > But the hardening wasn’t about spectre v1, but about cache-load gadgets? Ultimately, yes - the goal is cache load gadgets. Cache load gadgets are any memory read, where the attacker can influence the position of the load. The easy case to think about is the first half of a Spectre v1 gadget (i.e. the first memory load), but a second common case is a simple memory read with the wrong base pointer (as demonstrated clearly by SpectreGS and CVE-2019-1125). A 3rd case, which is actually the root of this discovery, is speculative type confusion where the processor executes code expecting to use {d,v}->arch.{pv,hvm}.$FOO, but is interpreting the data with the types of the other union. For people familiar with Speculative Store Bypass gadgets, this is the same kind of thing as the integer/pointer confusion in that case. The only viable fix for these is to avoid entering the basic block with the vulnerable code pattern in the first place. I.e. "fixing" Spectre v1. > Does it mean the CPU speculates over a direct call (assuming no #PF etc) and > assumes some default return value to be used? That wasn't the point I was trying to make, although Intel CPUs will speculatively execute the instructions following an indirect call/jmp while the frontend works out where to fetch from next. The point was that, to avoid entering the wrong basic block, the lfence must be later in the instruction stream than the conditional jump. The frontend has to follow the attacker controlled jump, then serialise itself to figure out if it went wrong. Serialising first, then following the attacker controlled jump still leaves a speculation window, which can execute the gadget. > If not, maybe we should be more concerned about the value the cache-loading > gadget speculates with, rather than the sheer speculation over the branch. In a mythical world where a complete solution existed, channels other than the data cache want considering. There is some interesting work with instruction cache and TLB timing analysis, and these are far harder to close. Given the SpectreGS case of a bad base pointer, rather than a bad offset, a non-spectre-v1 fix would have to clamp every register making up part of a memory operand. This is the approach which the Clang/LLVM Speculative Load Hardening feature goes for, and comes with a 30% perf hit or so. Furthermore, the current design of SLH is specific to userspace, and there is development work needed to make it safe for kernel mode code. Specifically, the muxing of the speculation token against registers needs to turn from &= token to |= token, and the sense of the token being 0 or -1 needs to reverse, because of kernel code operating in the top of the address space, rather than the bottom. There is a further complication given signed displacement. For kernel code, that can still speculatively wrap around into userspace, and SMAP (on meltdown-vulnerable parts) won't even halt speculation in this case. Further further more, for Xen, that would still move incorrect speculative memory accesses into PV guest kernel controlled space, rather than Xen controlled space. > >> To protect against speculatively executing the wrong basic block, the >> pipeline must execute the conditional jump first, *then* hit an lfence >> to serialise the instruction stream and revector in the case of >> incorrect speculation. > That’s true, but there are also 2 aspects worth mentioning: > 1) Example: > > jne 1 > PRIVILEGED > 1: > ALWAYS_SAFE > > We do not necessarily have to cover the 1: path with an lfence? > We could allow speculation there, as it is harmless. I agree, but how do you express that using evaluate_nospec()? There is no information tying what is privileged and what is safe, to the condition for entering the basic block. > 2) cache-load protection > > It might be ok to speculate over a conditional branch, when we can > guarantee that the value to be used in a dereference is not out-of-bound. I agree (in principle, because the guarantee is implausible to make in general case) but... > In that case an lfence is used to latch the value in the register. We can > still speculate over the branch and reach the dereference, but with a sane > value. ... this is reasoning about the wrong basic block. This analogy is: cmp ... #1 jcc 1f ALWAYS_SAFE lfence ("optimised" from the cmp #2 if statement) cmp ... #2 jcc 1f PRIVILEGED 1: ALWAYS_SAFE This is saying that the spilled lfence from cmp2 protects PRIVLEGED because it might reduce the speculative variability in registers. There are probably code sequences where that would be true, but there are plenty of others where it would not be true. This is because it is protecting cmp1's basic block (or at least, part of it), not cmp2's. It will protect against an attack which requires poisoning of both cmp1 and cmp2 to be function, but this is orders of magnitude harder for the attacker to detect and arrange for, than an attack which simply involves poisoning cmp2 to enter PRIVILEGED with an architecturally-correct register state intended for ALWAYS_SAFE. > I agree that lfence might not give us 100% security in every potential case, > but that is what "hardening" gives you... This being a best-effort approach is fine. There is no way it could ever be called complete with a straight face. If everything had been done using block_speculation(), that would also be ok. There is no way the optimiser can break the safety there, because it cannot reposition the lfence instruction relative to any memory references (although the subtleties of it being able to be repositioned relative to non-memory accesses still make it hard to reason about in general). The problem is that, at the moment, the optimiser is undermining the safety which is attempting to be inserted by the use of evaluate_nospec(). We have code which appears to be safe at the C level, and isn't. A false sense of security is arguably worse than no security. The random sprinkling of lfences will reduce the amount of speculation, but this is of very little use when it is only actually protecting against certain more-complicated, and therefore rarer, gadgets but leaves the common gadgets still exploitable. ~Andrew
_______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel