On Fri, Oct 08, 2021 at 05:31:11PM -0500, Segher Boessenkool wrote:
> On Fri, Oct 08, 2021 at 02:27:28PM -0500, Paul A. Clarke wrote:
> > On Fri, Oct 08, 2021 at 12:39:15PM -0500, Segher Boessenkool wrote:
> > I see. Thanks for the reference. If I understand correctly, volatile
> > prevents some optimizations based on the defined inputs/outputs, but
> > the asm could still be subject to reordering.
> 
> "asm volatile" means there is a side effect in the asm.  This means that
> it has to be executed on the real machine the same as on the abstract
> machine, with the side effects in the same order.
> 
> It can still be reordered, modulo those restrictions.  It can be merged
> with an identical asm as well.  And the compiler can split this into two
> identical asms on two paths.

It seems odd to me that the compiler can make any assumptions about
the side-effect(s). How does it know that a side-effect does not alter
computation (as it indeed does in this case), such that reordering is
a still correct (which it wouldn't be in this case)?

> In this case you might want a side effect (the instructions writes to
> the FPSCR after all).  But you need this to be tied to the FP code that
> you want the flags to be changed for, and to the restore of the flags,
> and finally you need to prevent other FP code from being scheduled in
> between.
> 
> You need more for that than just volatile, and the solution may well
> make volatile not wanted: tying the insns together somehow will
> naturally make the flags restored to a sane situation again, so the
> whole group can be removed if you want, etc.
> 
> > In this particular case, I don't think it's an issue with respect to
> > reordering.  The code in question is:
> > +      __asm__ __volatile__ ("mffsce %0" : "=f" (__fpscr_save.__fr));
> > +      __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8;
> > 
> > The output (__fpscr_save) is a source for the following assignment,
> > so the order should be respected, no?
> 
> Other FP code can be interleaved, and then do the wrong thing.
> 
> > With respect to volatile, I worry about removing it, because I do
> > indeed need that instruction to execute in order to clear the FPSCR
> > exception enable bits. That side-effect is not otherwise known to the
> > compiler.
> 
> Yes.  But as said above, volatile isn't enough to get this to behave
> correctly.
> 
> The easiest way out is to write this all in one piece of (inline) asm.

Ugh. I really don't want to go there, not just because it's work, but
I think this is a paradigm that should work without needing to drop
fully into asm.

Is there something unique about using an "asm" statement versus using,
say, a builtin like __builtin_mtfsf or a hypothetical __builtin_mffsce?
Very similar methods are used in glibc today. Are those broken?

Would creating a __builtin_mffsce be another solution?

Would adding memory barriers between the FPSCR manipulations and the
code which is bracketed by them be sufficient?

PC

Reply via email to