https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513
--- Comment #17 from Oleg Endo <olegendo at gcc dot gnu.org> --- (In reply to Oleg Endo from comment #14) > > The switch is done by 3 (+2 artificial) individual instructions (load - > modify - store). In this case the RA / optimizers figure out that there's > no need to store fpscr twice and reorder the operations. This is because > all the fp insn patterns in the machine description only "use" the fpscr, > but actually they also modify it. This means that the fenv is reset after > the 'fadd', i.e. it potentially clears exception flags etc. > > I think this is wrong. It also seems impossible to get the fpscr value > immediately after the fp insn, as it always gets reordered in some way. As > far as I understand, all the fp insns that update bits in fpscr should > actually do so (clobber it or set it in someway) and a builtin "get_fpscr" > is required so that optimizers see the dependencies on fpscr. In the 'addsf3_i' pattern, I've tried replacing the (use (match_operand:PSI 3 "fpscr_operand" "c")) with (set (match_operand:PSI 3 "fpscr_operand" "=&c") (unspec:PSI [(match_dup 3)] UNSPEC_FPSCR_SET))] and after that the asm output looks OK: sts fpscr,r1 mov.l .L2,r2 xor r2,r1 lds r1,fpscr fmov fr5,fr0 fadd fr4,fr0 sts fpscr,r1 xor r2,r1 rts lds r1,fpscr I haven't checked all the other side effects it could have, but at least the FMA combine patterns still seem work after that change.