On Thu, Oct 07, 2021 at 06:39:06PM -0500, Segher Boessenkool wrote: > On Mon, Aug 23, 2021 at 02:03:05PM -0500, Paul A. Clarke wrote: > > No attempt is made to optimize writing the FPSCR (by checking if the new > > value would be the same), other than using lighter weight instructions > > when possible. > > __builtin_set_fpscr_rn makes optimised code (using mtfsb[01]) > automatically, fwiw. > > > Move implementations of _mm_ceil* and _mm_floor* into _mm_round*, and > > convert _mm_ceil* and _mm_floor* into macros. This matches the current > > analogous implementations in config/i386/smmintrin.h. > > Hrm. Using function-like macros is begging for trouble, as usual. But > the x86 version does this, so meh. > > > +extern __inline __m128d > > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > > +_mm_round_pd (__m128d __A, int __rounding) > > +{ > > + __v2df __r; > > + union { > > + double __fr; > > + long long __fpscr; > > + } __enables_save, __fpscr_save; > > + > > + if (__rounding & _MM_FROUND_NO_EXC) > > + { > > + /* Save enabled exceptions, disable all exceptions, > > + and preserve the rounding mode. */ > > +#ifdef _ARCH_PWR9 > > + __asm__ __volatile__ ("mffsce %0" : "=f" (__fpscr_save.__fr)); > > The __volatile__ does likely not do what you want. As far as I can see > you do not want one here anyway? > > "volatile" does not order asm wrt fp insns, which you likely *do* want.
Reading the GCC docs, it looks like the "volatile" qualifier for "asm" has no effect at all (6.47.1): | The optional volatile qualifier has no effect. All basic asm blocks are | implicitly volatile. So, it could be removed without concern. > > + __v2df __r = { ((__v2df)__B)[0], ((__v2df) __A)[1] }; > > You put spaces after only some casts, btw? Well maybe I found the one > place you did it wrong, heh :-) And you can avoid having so many parens > by making extra variables -- much more readable. I'll fix this. > > + switch (__rounding) > > You do not need any of that __ either. I'm surprised that I don't. A .h file needs to be concerned about the namespace it inherits, no? > > +/* { dg-do run } */ > > +/* { dg-require-effective-target powerpc_vsx_ok } */ > > +/* { dg-options "-O2 -mvsx" } */ > > "dg-do run" requires vsx_hw, not just vsx_ok. Testing on a machine > without VSX (so before p7) would have shown that, but do you have access > to any? This is one of those things we are only told about a year after > it was added, because no one who tests often does that on so old > hardware :-) > > So, okay for trunk (and backports after some burn-in) with that vsx_ok > fixed. That asm needs fixing, but you can do that later. OK. Thanks! PC