https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513
--- Comment #14 from Oleg Endo <olegendo at gcc dot gnu.org> --- Created attachment 33690 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33690&action=edit a possible patch This is a simple patch that does sts-lds fpscr mode switching (not fully tested). With the patch applied, the following example float test (float x, float y) { return x + y; } compiled with -O2 -m4 (default fpu mode = double): sts fpscr,r1 ! 20 fpu_switch/8 mov.l .L2,r2 ! 22 movsi_ie/1 xor r2,r1 ! 23 *xorsi3_compact/2 lds r1,fpscr ! 25 fpu_switch/5 xor r2,r1 ! 29 *xorsi3_compact/2 fmov fr5,fr0 ! 34 movsf_ie/1 fadd fr4,fr0 ! 8 addsf3_i rts ! 37 *return_i lds r1,fpscr ! 31 fpu_switch/5 .L3: .align 2 .L2: .long 524288 The switch is done by 3 (+2 artificial) individual instructions (load - modify - store). In this case the RA / optimizers figure out that there's no need to store fpscr twice and reorder the operations. This is because all the fp insn patterns in the machine description only "use" the fpscr, but actually they also modify it. This means that the fenv is reset after the 'fadd', i.e. it potentially clears exception flags etc. I think this is wrong. It also seems impossible to get the fpscr value immediately after the fp insn, as it always gets reordered in some way. As far as I understand, all the fp insns that update bits in fpscr should actually do so (clobber it or set it in someway) and a builtin "get_fpscr" is required so that optimizers see the dependencies on fpscr.