On Mon, Oct 11, 2021 at 05:04:12PM -0500, Segher Boessenkool wrote:
> On Mon, Oct 11, 2021 at 12:31:07PM -0500, Paul A. Clarke wrote:
> > On Mon, Oct 11, 2021 at 11:28:39AM -0500, Segher Boessenkool wrote:
> > > > Very similar methods are used in glibc today. Are those broken?
> > > 
> > > Maybe.
> > 
> > Ouch.
> 
> So show the code?

You asked for it. ;-)  Boiled down to remove macroisms and code that
should be removed by optimization:
--
static __inline __attribute__ ((__always_inline__)) void
libc_feholdsetround_ppc_ctx (struct rm_ctx *ctx, int r)
{
  fenv_union_t old;
  register fenv_union_t __fr;
  __asm__ __volatile__ ("mffscrni %0,%1" : "=f" (__fr.fenv) : "i" (r));
  ctx->env = old.fenv = __fr.fenv; 
  ctx->updated_status = (r != (old.l & 3));
}
static __inline __attribute__ ((__always_inline__)) void
libc_feresetround_ppc (fenv_t *envp)
{ 
  fenv_union_t new = { .fenv = *envp };
  register fenv_union_t __fr;
  __fr.l = new.l & 3;
  __asm__ __volatile__ ("mffscrn %0,%1" : "=f" (__fr.fenv) : "f" (__fr.fenv));
}
double
__sin (double x)
{
  struct rm_ctx ctx __attribute__ ((cleanup (libc_feresetround_ppc_ctx)));
  libc_feholdsetround_ppc_ctx (&ctx, (0));
  /* floating point intensive code.  */
  return retval;
}
--

There's not much to it, really.  "mffscrni" on the way in to save and set
a required rounding mode, and "mffscrn" on the way out to restore it.

> > > If you get a real (i.e. not inline) function call there, that
> > > can save you often.
> > 
> > Calling a real function in order to execute a single instruction is
> > sub-optimal. ;-)
> 
> Calling a real function (that does not even need a stack frame, just a
> blr) is not terribly expensive, either.

Not ideal, better would be better.

> > > > Would creating a __builtin_mffsce be another solution?
> > > 
> > > Yes.  And not a bad idea in the first place.
> > 
> > The previous "Nope" and this "Yes" seem in contradiction. If there is no
> > difference between "asm" and builtin, how does using a builtin solve the
> > problem?
> 
> You will have to make the builtin solve it.  What a builtin can do is
> virtually unlimited.  What an asm can do is not: it just outputs some
> assembler language, and does in/out/clobber constraints.  You can do a
> *lot* with that, but it is much more limited than everything you can do
> in the compiler!  :-)
> 
> The fact remains that there is no way in RTL (or Gimple for that matter)
> to express things like rounding mode changes.  You will need to
> artificially make some barriers.

I know there is __builtin_set_fpscr_rn that generates mffscrn. This
is not used in the code above because I believe it first appears in
GCC 9.1 or so, and glibc still supports GCC 6.2 (and it doesn't define
a return value, which would be handy in this case).  Does the
implementation of that builtin meet the requirements needed here,
to prevent reordering of FP computation across instantiations of the
builtin?  If not, is there a model on which to base an implementation
of __builtin_mffsce (or some preferred name)?

PC

Reply via email to