[Bug target/53513] SH Target: Add support for fschg and fpchg insns

olegendo at gcc dot gnu.org Sun, 16 Mar 2014 13:48:07 -0700

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513


Oleg Endo <olegendo at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2014-03-16
     Ever confirmed|0                           |1

--- Comment #4 from Oleg Endo <olegendo at gcc dot gnu.org> ---
As mentioned in PR 60138, this issue also prevents a working implementation of
fenv.h & friends on SH.

The idea would be to get rid of the __fpscr_values first and set the FPSCR.PR
bit with insn sequences such like..

set pr = 1:
  sts      fpscr,r2
  mov.l   #(1 << 19),r1
  or      r1,r2
  lds      r2,fpscr

set pr = 0:
  sts     fpscr,r2
  mov.l   #~(1 << 19),r1
  and      r1,r2
  lds     r2,fpscr

This would obviously result in a performance regression but would work with all
SH FPUs.

On SH4A this can then be improved by adding support for fpchg.  Although this
would require changes/extensions to the mode switching machinery, as mentioned
in PR 29349.  The problem is that the mode switching pass emits only mode
changes to a particular mode, not from mode 'x' to mode 'y'.  In PR 29349 an
extension of the pre_edge_lcm function is suggested which would make the
necessary information available.

Here are a few more 'requirements' for SH specific mode change issues:

1)
The following function:

double test (const float* a, const float* b, const double* c, float x)
{
  float aa = a[0] * b[0];
  double cc = c[0] + c[1];

  aa += b[1] * b[2];
  cc += c[2] + c[3];
  aa += b[3] * b[4];
  cc += c[4] + c[5];
  aa += b[5] * b[6];

  return aa / cc;
}

compiled with -m4 -O2 (default PR mode = double) results in 4 mode switches. 
Rewriting it as:

double test (const float* a, const float* b, const double* c, float x)
{
  float aa = a[0] * b[0] + b[1] * b[2] + b[3] * b[4] + b[5] * b[6];

  double cc = c[0] + c[1];
  cc += c[2] + c[3];
  cc += c[4] + c[5];

  return aa / cc;
}

results only in 2 mode switches (as expected).  FP operations which are
independent should be reordered in order to minimize mode switches.
This could go as far as ...


2)
... doing loop distribution, for cases such as:

double test (const float* x, const double* y, unsigned int c)
{
  float r0 = 0;
  double r1 = 0;

  while (c--)
  {
    float xx = x[0] * x[1] + x[2] + 123.0f;
    x += 3;

    double yy = y[0] + y[1];
    y += 2;

    r0 += xx;
    r1 += yy;
  }

  return r0 + r1;
}

which currently produces a loop with 4 mode switches in it.  Reordering the FP
operations would bring this down to 2 mode switches in the loop.  Since r0 and
r1 are calculated independently, 2 loops can be used, having the mode switches
outside the loops.


3)
FPSCR.SZ mode changes might interfere with FPSCR.PR mode changes.  For example,
using fschg to flip FPSCR.SZ might require changing FPSCR.PR first (and
potentially changing it back).  If fpchg is not available (SH4A only), it's
better to set both bits directly.  In order to minimize mode switches it might
be necessary to reorder instructions and doing loop distribution while looking
at PR and SZ bits simultaneously.


4)
rounding mode settings also mean FPSCR mode changes.
http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01378.html


5)
in some cases preserving FPSCR bits across mode changes is not required (if I'm
not mistaken):

double func (float a, float b, double c, double d)
{
  #pragma STDC FENV_ACCESS ON

  // function entry, PR = double

  // mode switch PR = single
  float ab = a + b;

  // mode switch PR = double
  double x = ab + c + d;

  // read back FP status bits and do something with it
  return x;

  // function exit, PR = double
}

In this case the mode switch double -> float -> double can be done more
efficiently by pushing the PR = double FPSCR state onto the stack, switch to PR
= single and then switch back to PR = double by popping FPSCR from the stack.

However, this must not happen if other FPSCR settings are changed after the
first switch to PR = single, such as invoking a fenv modifying standard
function or changing the FPSCR.FR bit on SH4.

[Bug target/53513] SH Target: Add support for fschg and fpchg insns

Reply via email to