http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513
Oleg Endo <olegendo at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2014-03-16 Ever confirmed|0 |1 --- Comment #4 from Oleg Endo <olegendo at gcc dot gnu.org> --- As mentioned in PR 60138, this issue also prevents a working implementation of fenv.h & friends on SH. The idea would be to get rid of the __fpscr_values first and set the FPSCR.PR bit with insn sequences such like.. set pr = 1: sts fpscr,r2 mov.l #(1 << 19),r1 or r1,r2 lds r2,fpscr set pr = 0: sts fpscr,r2 mov.l #~(1 << 19),r1 and r1,r2 lds r2,fpscr This would obviously result in a performance regression but would work with all SH FPUs. On SH4A this can then be improved by adding support for fpchg. Although this would require changes/extensions to the mode switching machinery, as mentioned in PR 29349. The problem is that the mode switching pass emits only mode changes to a particular mode, not from mode 'x' to mode 'y'. In PR 29349 an extension of the pre_edge_lcm function is suggested which would make the necessary information available. Here are a few more 'requirements' for SH specific mode change issues: 1) The following function: double test (const float* a, const float* b, const double* c, float x) { float aa = a[0] * b[0]; double cc = c[0] + c[1]; aa += b[1] * b[2]; cc += c[2] + c[3]; aa += b[3] * b[4]; cc += c[4] + c[5]; aa += b[5] * b[6]; return aa / cc; } compiled with -m4 -O2 (default PR mode = double) results in 4 mode switches. Rewriting it as: double test (const float* a, const float* b, const double* c, float x) { float aa = a[0] * b[0] + b[1] * b[2] + b[3] * b[4] + b[5] * b[6]; double cc = c[0] + c[1]; cc += c[2] + c[3]; cc += c[4] + c[5]; return aa / cc; } results only in 2 mode switches (as expected). FP operations which are independent should be reordered in order to minimize mode switches. This could go as far as ... 2) ... doing loop distribution, for cases such as: double test (const float* x, const double* y, unsigned int c) { float r0 = 0; double r1 = 0; while (c--) { float xx = x[0] * x[1] + x[2] + 123.0f; x += 3; double yy = y[0] + y[1]; y += 2; r0 += xx; r1 += yy; } return r0 + r1; } which currently produces a loop with 4 mode switches in it. Reordering the FP operations would bring this down to 2 mode switches in the loop. Since r0 and r1 are calculated independently, 2 loops can be used, having the mode switches outside the loops. 3) FPSCR.SZ mode changes might interfere with FPSCR.PR mode changes. For example, using fschg to flip FPSCR.SZ might require changing FPSCR.PR first (and potentially changing it back). If fpchg is not available (SH4A only), it's better to set both bits directly. In order to minimize mode switches it might be necessary to reorder instructions and doing loop distribution while looking at PR and SZ bits simultaneously. 4) rounding mode settings also mean FPSCR mode changes. http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01378.html 5) in some cases preserving FPSCR bits across mode changes is not required (if I'm not mistaken): double func (float a, float b, double c, double d) { #pragma STDC FENV_ACCESS ON // function entry, PR = double // mode switch PR = single float ab = a + b; // mode switch PR = double double x = ab + c + d; // read back FP status bits and do something with it return x; // function exit, PR = double } In this case the mode switch double -> float -> double can be done more efficiently by pushing the PR = double FPSCR state onto the stack, switch to PR = single and then switch back to PR = double by popping FPSCR from the stack. However, this must not happen if other FPSCR settings are changed after the first switch to PR = single, such as invoking a fenv modifying standard function or changing the FPSCR.FR bit on SH4.