https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513

--- Comment #14 from Oleg Endo <olegendo at gcc dot gnu.org> ---
Created attachment 33690
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33690&action=edit
a possible patch

This is a simple patch that does sts-lds fpscr mode switching (not fully
tested).

With the patch applied, the following example

float test (float x, float y)
{
  return x + y;
}

compiled with -O2 -m4 (default fpu mode = double):

        sts     fpscr,r1        ! 20    fpu_switch/8
        mov.l   .L2,r2          ! 22    movsi_ie/1
        xor     r2,r1           ! 23    *xorsi3_compact/2
        lds     r1,fpscr        ! 25    fpu_switch/5
        xor     r2,r1           ! 29    *xorsi3_compact/2
        fmov    fr5,fr0         ! 34    movsf_ie/1
        fadd    fr4,fr0         ! 8     addsf3_i
        rts                     ! 37    *return_i
        lds     r1,fpscr        ! 31    fpu_switch/5
.L3:
        .align 2
.L2:
        .long   524288

The switch is done by 3 (+2 artificial) individual instructions (load - modify
- store).  In this case the RA / optimizers figure out that there's no need to
store fpscr twice and reorder the operations.  This is because all the fp insn
patterns in the machine description only "use" the fpscr, but actually they
also modify it.  This means that the fenv is reset after the 'fadd', i.e. it
potentially clears exception flags etc.

I think this is wrong.  It also seems impossible to get the fpscr value
immediately after the fp insn, as it always gets reordered in some way.  As far
as I understand, all the fp insns that update bits in fpscr should actually do
so (clobber it or set it in someway) and a builtin "get_fpscr" is required so
that optimizers see the dependencies on fpscr.

Reply via email to