On 02/10/2019 10:50, Wieczorkiewicz, Pawel wrote:
>>>>> I've taken, as a simple example,
>>>>> p2m_mem_access_sanity_check(), and this looks to be the way gcc9 has 
>>>>> generated
>>>>> code (in a non-debug build). Hence either I'm mis-remembering what we want
>>>>> things to look like, or there's more to it than code generation simply 
>>>>> being
>>>>> "not correct".
>>>> This example demonstrates the problem, and actually throws a further
>>>> spanner in the works of how make this safe, which hadn't occurred to me
>>>> before.
>>>>
>>>> The instruction stream from a caller of p2m_mem_access_sanity_check()
>>>> will look something like this:
>>>>
>>>> call p2m_mem_access_sanity_check
>>>>     ...
>>>>     lfence
>>>>     ...
>>>>     ret   
>>>> cmp $0, %eax
>>>> jne ...
>>>>
>>>> Which is unsafe, because the only safe way to arrange this code would be:
>>>>
>>>> call p2m_mem_access_sanity_check
>>>>     ...
>>>>     ret
>>>> cmp $0, %eax
>>>> jne 1f
>>>> lfence
>>>> ...
>>>> 1: lfence
>>>> …
>>>>

Answering out of order, because I think this will make things clearer.

> But the hardening wasn’t about spectre v1, but about cache-load gadgets?

Ultimately, yes - the goal is cache load gadgets.

Cache load gadgets are any memory read, where the attacker can influence
the position of the load.  The easy case to think about is the first
half of a Spectre v1 gadget (i.e. the first memory load), but a second
common case is a simple memory read with the wrong base pointer (as
demonstrated clearly by SpectreGS and CVE-2019-1125).

A 3rd case, which is actually the root of this discovery, is speculative
type confusion where the processor executes code expecting to use
{d,v}->arch.{pv,hvm}.$FOO, but is interpreting the data with the types
of the other union.  For people familiar with Speculative Store Bypass
gadgets, this is the same kind of thing as the integer/pointer confusion
in that case.

The only viable fix for these is to avoid entering the basic block with
the vulnerable code pattern in the first place.  I.e. "fixing" Spectre v1.

> Does it mean the CPU speculates over a direct call (assuming no #PF etc) and
> assumes some default return value to be used?

That wasn't the point I was trying to make, although Intel CPUs will
speculatively execute the instructions following an indirect call/jmp
while the frontend works out where to fetch from next.

The point was that, to avoid entering the wrong basic block, the lfence
must be later in the instruction stream than the conditional jump.  The
frontend has to follow the attacker controlled jump, then serialise
itself to figure out if it went wrong.

Serialising first, then following the attacker controlled jump still
leaves a speculation window, which can execute the gadget.

> If not, maybe we should be more concerned about the value the cache-loading
> gadget speculates with, rather than the sheer speculation over the branch.

In a mythical world where a complete solution existed, channels other
than the data cache want considering.  There is some interesting work
with instruction cache and TLB timing analysis, and these are far harder
to close.

Given the SpectreGS case of a bad base pointer, rather than a bad
offset, a non-spectre-v1 fix would have to clamp every register making
up part of a memory operand.

This is the approach which the Clang/LLVM Speculative Load Hardening
feature goes for, and comes with a 30% perf hit or so.  Furthermore, the
current design of SLH is specific to userspace, and there is development
work needed to make it safe for kernel mode code.

Specifically, the muxing of the speculation token against registers
needs to turn from &= token to |= token, and the sense of the token
being 0 or -1 needs to reverse, because of kernel code operating in the
top of the address space, rather than the bottom.  There is a further
complication given signed displacement.  For kernel code, that can still
speculatively wrap around into userspace, and SMAP (on
meltdown-vulnerable parts) won't even halt speculation in this case.

Further further more, for Xen, that would still move incorrect
speculative memory accesses into PV guest kernel controlled space,
rather than Xen controlled space.

>
>> To protect against speculatively executing the wrong basic block, the
>> pipeline must execute the conditional jump first, *then* hit an lfence
>> to serialise the instruction stream and revector in the case of
>> incorrect speculation.
> That’s true, but there are also 2 aspects worth mentioning:
> 1) Example:
>
> jne 1
> PRIVILEGED
> 1:
> ALWAYS_SAFE
>
> We do not necessarily have to cover the 1: path with an lfence?
> We could allow speculation there, as it is harmless.

I agree, but how do you express that using evaluate_nospec()?

There is no information tying what is privileged and what is safe, to
the condition for entering the basic block.

> 2) cache-load protection
>
> It might be ok to speculate over a conditional branch, when we can
> guarantee that the value to be used in a dereference is not out-of-bound.

I agree (in principle, because the guarantee is implausible to make in
general case) but...

> In that case an lfence is used to latch the value in the register. We can
> still speculate over the branch and reach the dereference, but with a sane 
> value.

... this is reasoning about the wrong basic block.  This analogy is:

cmp ... #1
jcc 1f
    ALWAYS_SAFE
    lfence ("optimised" from the cmp #2 if statement)
    cmp ... #2
    jcc 1f
    PRIVILEGED
1:
ALWAYS_SAFE

This is saying that the spilled lfence from cmp2 protects PRIVLEGED
because it might reduce the speculative variability in registers.  There
are probably code sequences where that would be true, but there are
plenty of others where it would not be true.

This is because it is protecting cmp1's basic block (or at least, part
of it), not cmp2's.  It will protect against an attack which requires
poisoning of both cmp1 and cmp2 to be function, but this is orders of
magnitude harder for the attacker to detect and arrange for, than an
attack which simply involves poisoning cmp2 to enter PRIVILEGED with an
architecturally-correct register state intended for ALWAYS_SAFE.

> I agree that lfence might not give us 100% security in every potential case,
> but that is what "hardening" gives you...

This being a best-effort approach is fine.  There is no way it could
ever be called complete with a straight face.

If everything had been done using block_speculation(), that would also
be ok.  There is no way the optimiser can break the safety there,
because it cannot reposition the lfence instruction relative to any
memory references (although the subtleties of it being able to be
repositioned relative to non-memory accesses still make it hard to
reason about in general).

The problem is that, at the moment, the optimiser is undermining the
safety which is attempting to be inserted by the use of evaluate_nospec().

We have code which appears to be safe at the C level, and isn't.  A
false sense of security is arguably worse than no security.

The random sprinkling of lfences will reduce the amount of speculation,
but this is of very little use when it is only actually protecting
against certain more-complicated, and therefore rarer, gadgets but
leaves the common gadgets still exploitable.

~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Reply via email to