On Mon, Oct 11, 2021 at 08:46:17AM -0500, Paul A. Clarke wrote: > On Fri, Oct 08, 2021 at 05:31:11PM -0500, Segher Boessenkool wrote: > > "asm volatile" means there is a side effect in the asm. This means that > > it has to be executed on the real machine the same as on the abstract > > machine, with the side effects in the same order. > > > > It can still be reordered, modulo those restrictions. It can be merged > > with an identical asm as well. And the compiler can split this into two > > identical asms on two paths. > > It seems odd to me that the compiler can make any assumptions about > the side-effect(s). How does it know that a side-effect does not alter > computation (as it indeed does in this case), such that reordering is > a still correct (which it wouldn't be in this case)?
Because by definition side effects do not change the computation (where "computation" means "the outputs of the asm")! And if you are talking about changing future computations, as floating point control flags can be used for: this falls ouside of the C abstract machine, other than fe[gs]etround etc. > > > With respect to volatile, I worry about removing it, because I do > > > indeed need that instruction to execute in order to clear the FPSCR > > > exception enable bits. That side-effect is not otherwise known to the > > > compiler. > > > > Yes. But as said above, volatile isn't enough to get this to behave > > correctly. > > > > The easiest way out is to write this all in one piece of (inline) asm. > > Ugh. I really don't want to go there, not just because it's work, but > I think this is a paradigm that should work without needing to drop > fully into asm. Yes. Let's say GCC still has some challenges here :-( > Is there something unique about using an "asm" statement versus using, > say, a builtin like __builtin_mtfsf or a hypothetical __builtin_mffsce? Nope. > Very similar methods are used in glibc today. Are those broken? Maybe. If you get a real (i.e. not inline) function call there, that can save you often. > Would creating a __builtin_mffsce be another solution? Yes. And not a bad idea in the first place. > Would adding memory barriers between the FPSCR manipulations and the > code which is bracketed by them be sufficient? No, what you want to order is not memory accesses, but FP computations relative to the insns that change the FP control bits. If *both* of those change memory you can artificially order them with that. But most FP computations do not access memory. Segher