On Tue, 9 Jan 2018, Richard Earnshaw (lists) wrote: > > Sorry, I don't follow. On ARM, it is surprising that CSEL-CSDB-LDR sequence > > wouldn't work (applying CSEL to the address rather than loaded value), and > > if it wouldn't, then ARM-specific lowering of the builtin can handle that > > anyhow, right? (by spilling the pointer) > > The load has to feed /in/ to the csel/csdb sequence, not come after it.
Again, I'm sorry, but I have to insist that what you're saying here contradicts the documentation linked from https://developer.arm.com/support/security-update The PDF currently says, in "Details of the CSDB barrier": Until the barrier completes: 1) For any load, store, data or instruction preload, RW2, appearing in program order *after the barrier* [...] 2) For any indirect branch (B2), appearing in program order *after the barrier* [...] [...] the speculative execution of RW2/B2 does not influence the allocations of entries in a cache [...] It doesn't say anything about the behavior of CSDB being dependent on the loads encountered prior to it. (and imho it doesn't make sense for a hardware implementation to work that way) > As I explained to Bernd last night, I think this is likely be unsafe. > If there's some control path before __builtin_nontransparent that allows > 'predicate' to be simplified (eg by value range propagation), then your > guard doesn't protect against the speculation that you think it does. > Changing all the optimizers to guarantee that wouldn't happen (and > guaranteeing that all future optimizers won't introduce new problems of > that nature) is, I suspect, very non-trivial. But note that in that case the compiler could have performed the same simplification in the original code as well, emitting straight-line machine code lacking speculatively executable parts in the first place. Alexander