On Fri, Oct 08, 2021 at 05:31:11PM -0500, Segher Boessenkool wrote: > On Fri, Oct 08, 2021 at 02:27:28PM -0500, Paul A. Clarke wrote: > > On Fri, Oct 08, 2021 at 12:39:15PM -0500, Segher Boessenkool wrote: > > I see. Thanks for the reference. If I understand correctly, volatile > > prevents some optimizations based on the defined inputs/outputs, but > > the asm could still be subject to reordering. > > "asm volatile" means there is a side effect in the asm. This means that > it has to be executed on the real machine the same as on the abstract > machine, with the side effects in the same order. > > It can still be reordered, modulo those restrictions. It can be merged > with an identical asm as well. And the compiler can split this into two > identical asms on two paths.
It seems odd to me that the compiler can make any assumptions about the side-effect(s). How does it know that a side-effect does not alter computation (as it indeed does in this case), such that reordering is a still correct (which it wouldn't be in this case)? > In this case you might want a side effect (the instructions writes to > the FPSCR after all). But you need this to be tied to the FP code that > you want the flags to be changed for, and to the restore of the flags, > and finally you need to prevent other FP code from being scheduled in > between. > > You need more for that than just volatile, and the solution may well > make volatile not wanted: tying the insns together somehow will > naturally make the flags restored to a sane situation again, so the > whole group can be removed if you want, etc. > > > In this particular case, I don't think it's an issue with respect to > > reordering. The code in question is: > > + __asm__ __volatile__ ("mffsce %0" : "=f" (__fpscr_save.__fr)); > > + __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8; > > > > The output (__fpscr_save) is a source for the following assignment, > > so the order should be respected, no? > > Other FP code can be interleaved, and then do the wrong thing. > > > With respect to volatile, I worry about removing it, because I do > > indeed need that instruction to execute in order to clear the FPSCR > > exception enable bits. That side-effect is not otherwise known to the > > compiler. > > Yes. But as said above, volatile isn't enough to get this to behave > correctly. > > The easiest way out is to write this all in one piece of (inline) asm. Ugh. I really don't want to go there, not just because it's work, but I think this is a paradigm that should work without needing to drop fully into asm. Is there something unique about using an "asm" statement versus using, say, a builtin like __builtin_mtfsf or a hypothetical __builtin_mffsce? Very similar methods are used in glibc today. Are those broken? Would creating a __builtin_mffsce be another solution? Would adding memory barriers between the FPSCR manipulations and the code which is bracketed by them be sufficient? PC