On 1/25/19 11:14, Jan Beulich wrote:
>>>> On 24.01.19 at 22:29, <andrew.coop...@citrix.com> wrote:
>> Worse is the "evaluate condition, stash result, fence, use variable"
>> option, which is almost completely useless.  If you work out the
>> resulting instruction stream, you'll have a conditional expression
>> calculated down into a register, then a fence, then a test register and
>> conditional jump into one of two basic blocks.  This takes the perf hit,
>> and doesn't protect either of the basic blocks for speculative
>> mis-execution.
> How does it not protect anything? It shrinks the speculation window
> to just the register test and conditional branch, which ought to be
> far smaller than that behind a memory access which fails to hit any
> of the caches (and perhaps even any of the TLBs). This is the more
> that LFENCE does specifically not prevent insn fetching from
> continuing.
>
> That said I agree that the LFENCE would better sit between the
> register test and the conditional branch, but as we've said so many
> times before - this can't be achieved without compiler support. It's
> said enough that the default "cc" clobber of asm()-s on x86 alone
> prevents this from possibly working, while my over four year old
> patch to add a means to avoid this has not seen sufficient
> comments to get it into some hopefully acceptable shape, but also
> has not been approved as is.
>
> Then again, following an earlier reply of mine on another sub-
> thread, nothing really prevents the compiler from moving ahead
> and folding the two LFENCEs of the "both branches" model into
> one. It just so happens that apparently right now this never
> occurs (assuming Norbert has done full generated code analysis
> to confirm the intended placement).

I am happy to jump back to my earlier version without a configuration
option to protect both branches with a lfence instruction, using logic
operators. For this version, I actually looked into the object dump and
checked for various locations that the lfence statment was added for
both blocks after the jump instruction. So, for the compiler I used did
not move the lfence instruction before the jump instruction and merged
them. I actually hope that the lazy evaluation of logic prevents the
compiler from doing so.

A note on performance: I created a set of micro benchmarks that call
certain hypercall+command pairs in a tight loop many times. These
hypercalls target locations I modified with this patch series. The
current state of testing shows that in the worst case the full series
adds at most 3% runtime (relative to what the same hypercall took before
the modification). The testing used the evaluate_nospec implementation
that protects both branches via logic operators. Given that those are
micro benchmarks, I expect the impact for usual user work loads is even
lower, but I did not measure any userland benchmarks yet. In case you
point me to performance tests you typically use, I can also look into
that. Thanks!

Best,
Norbert

>
> Jan
>
>




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich
Ust-ID: DE 289 237 879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Reply via email to