[Bug target/53513] SH Target: Add support for fschg and fpchg insns

olegendo at gcc dot gnu.org Sat, 11 Oct 2014 15:10:19 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513


--- Comment #16 from Oleg Endo <olegendo at gcc dot gnu.org> ---
I've tried a modified example from PR 5360, using floats instead of doubles:

void loop_p (int np, int non0, float coeff[][2048], float tmp1)
{
  int j, k;

  for (j = non0; j < np; j++)
    for (k = 0; k < j; k++)
      coeff[j][j] -= tmp1 * coeff[j][k];
}

with -O2 -m4a (double mode default) and the patch from comment #15 applied:

  (loop setup code omitted)
        ...
.L6:
        cmp/pl  r5           ! outer loop, set to single
        bf/s    .L7
        sts     fpscr,r7
        mov.l   .L16,r4
        mov     r0,r2
        fmov.s  @r3,fr1
        mov     r5,r1
        and     r4,r7
        lds     r7,fpscr
        .align 2
.L5:
        fmov.s  @r2+,fr0     ! inner loop, no switch
        dt      r1
        fneg    fr0
        fmac    fr0,fr5,fr1
        bf/s    .L5
        fmov.s  fr1,@r3
.L7:
        dt      r6
        add     #1,r5
        add     r9,r0
        bf/s    .L6
        add     r8,r3

        sts     fpscr,r1     ! function return, set to double
        mov.l   .L17,r2
        mov.l   @r15+,r9
        or      r2,r1
        mov.l   @r15+,r8
        rts
        lds     r1,fpscr

Obviously, if the inner loop count is small the mode set in the outer loop will
dominate.  Something seems to be missing in the mode-switch optimization.  The
mode switch should be just hoisted above all loops, which then can use the
fpchg insn on SH4A.

[Bug target/53513] SH Target: Add support for fschg and fpchg insns

Reply via email to