On 1/25/19 11:14, Jan Beulich wrote: >>>> On 24.01.19 at 22:29, <andrew.coop...@citrix.com> wrote: >> Worse is the "evaluate condition, stash result, fence, use variable" >> option, which is almost completely useless. If you work out the >> resulting instruction stream, you'll have a conditional expression >> calculated down into a register, then a fence, then a test register and >> conditional jump into one of two basic blocks. This takes the perf hit, >> and doesn't protect either of the basic blocks for speculative >> mis-execution. > How does it not protect anything? It shrinks the speculation window > to just the register test and conditional branch, which ought to be > far smaller than that behind a memory access which fails to hit any > of the caches (and perhaps even any of the TLBs). This is the more > that LFENCE does specifically not prevent insn fetching from > continuing. > > That said I agree that the LFENCE would better sit between the > register test and the conditional branch, but as we've said so many > times before - this can't be achieved without compiler support. It's > said enough that the default "cc" clobber of asm()-s on x86 alone > prevents this from possibly working, while my over four year old > patch to add a means to avoid this has not seen sufficient > comments to get it into some hopefully acceptable shape, but also > has not been approved as is. > > Then again, following an earlier reply of mine on another sub- > thread, nothing really prevents the compiler from moving ahead > and folding the two LFENCEs of the "both branches" model into > one. It just so happens that apparently right now this never > occurs (assuming Norbert has done full generated code analysis > to confirm the intended placement).
I am happy to jump back to my earlier version without a configuration option to protect both branches with a lfence instruction, using logic operators. For this version, I actually looked into the object dump and checked for various locations that the lfence statment was added for both blocks after the jump instruction. So, for the compiler I used did not move the lfence instruction before the jump instruction and merged them. I actually hope that the lazy evaluation of logic prevents the compiler from doing so. A note on performance: I created a set of micro benchmarks that call certain hypercall+command pairs in a tight loop many times. These hypercalls target locations I modified with this patch series. The current state of testing shows that in the worst case the full series adds at most 3% runtime (relative to what the same hypercall took before the modification). The testing used the evaluate_nospec implementation that protects both branches via logic operators. Given that those are micro benchmarks, I expect the impact for usual user work loads is even lower, but I did not measure any userland benchmarks yet. In case you point me to performance tests you typically use, I can also look into that. Thanks! Best, Norbert > > Jan > > Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel