https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513

--- Comment #17 from Oleg Endo <olegendo at gcc dot gnu.org> ---
(In reply to Oleg Endo from comment #14)
> 
> The switch is done by 3 (+2 artificial) individual instructions (load -
> modify - store).  In this case the RA / optimizers figure out that there's
> no need to store fpscr twice and reorder the operations.  This is because
> all the fp insn patterns in the machine description only "use" the fpscr,
> but actually they also modify it.  This means that the fenv is reset after
> the 'fadd', i.e. it potentially clears exception flags etc.
> 
> I think this is wrong.  It also seems impossible to get the fpscr value
> immediately after the fp insn, as it always gets reordered in some way.  As
> far as I understand, all the fp insns that update bits in fpscr should
> actually do so (clobber it or set it in someway) and a builtin "get_fpscr"
> is required so that optimizers see the dependencies on fpscr.

In the 'addsf3_i' pattern, I've tried replacing the

    (use (match_operand:PSI 3 "fpscr_operand" "c"))

with

    (set (match_operand:PSI 3 "fpscr_operand" "=&c")
         (unspec:PSI [(match_dup 3)] UNSPEC_FPSCR_SET))]

and after that the asm output looks OK:
        sts     fpscr,r1
        mov.l   .L2,r2
        xor     r2,r1
        lds     r1,fpscr
        fmov    fr5,fr0
        fadd    fr4,fr0
        sts     fpscr,r1
        xor     r2,r1
        rts
        lds     r1,fpscr

I haven't checked all the other side effects it could have, but at least the
FMA combine patterns still seem work after that change.

Reply via email to