------- Comment #7 from uros at kss-loka dot si 2006-05-31 10:56 ------- IMO the fact that gcc 3.x beats 4.x on this code could be attributed to pure luck.
Looking into 3.x RTL, these things can be observed: Instruction that multiplies pA0 and rB0 is described as: __.20.combine: (insn 75 73 76 2 (set (reg:DF 84) (mult:DF (mem:DF (reg/v/f:DI 70 [ pA0 ]) [0 S8 A64]) (reg/v:DF 78 [ rB0 ]))) 551 {*fop_df_comm_nosse} (insn_list 65 (nil)) (nil)) At this point, first input operand does not satisfy the operand constraint, so register allocator pushes memory operand into the register: __.25.greg: (insn 703 73 75 2 (set (reg:DF 8 st [84]) (mem:DF (reg/v/f:DI 0 ax [orig:70 pA0 ] [70]) [0 S8 A64])) 96 {*movdf_integer} (nil) (nil)) (insn 75 703 76 2 (set (reg:DF 8 st [84]) (mult:DF (reg:DF 8 st [84]) (reg/v:DF 9 st(1) [orig:78 rB0 ] [78]))) 551 {*fop_df_comm_nosse} (insn_list 65 (nil)) (nil)) This RTL produces following asm sequence: fldl (%rax) #* pA0 fmul %st(1), %st # In 4.x case, we have: __.127r.combine: (insn 60 58 61 4 (set (reg:DF 207) (mult:DF (reg/v:DF 187 [ rB0 ]) (mem:DF (plus:DI (reg/v/f:DI 178 [ pA0.161 ]) (const_int 960 [0x3c0])) [0 S8 A64]))) 591 {*fop_df_comm_i387} (nil) (nil)) This instruction almost satisfies operand constraint, and register allocator produces: __.138r.greg: (insn 470 58 60 5 (set (reg:DF 12 st(4) [207]) (reg/v:DF 8 st [orig:187 rB0 ] [187])) 94 {*movdf_integer} (nil) (nil)) (insn 60 470 61 5 (set (reg:DF 12 st(4) [207]) (mult:DF (reg:DF 12 st(4) [207]) (mem:DF (plus:DI (reg/v/f:DI 0 ax [orig:178 pA0.161 ] [178]) (const_int 960 [0x3c0])) [0 S8 A64]))) 591 {*fop_df_comm_i387} (nil) (nil)) Stack handling then fixes this RTL to: __.151r.stack: (insn 470 58 60 4 (set (reg:DF 8 st) (reg:DF 8 st)) 94 {*movdf_integer} (nil) (nil)) (insn 60 470 61 4 (set (reg:DF 8 st) (mult:DF (reg:DF 8 st) (mem:DF (plus:DI (reg/v/f:DI 0 ax [orig:178 pA0.161 ] [178]) (const_int 960 [0x3c0])) [0 S8 A64]))) 591 {*fop_df_comm_i387} (nil) (nil)) >From your measurement, it looks that instead of: fld %st(0) # fmull (%rax) #* pA0.161 it is faster to emit fldl (%rax) #* pA0 fmul %st(1), %st #, -- uros at kss-loka dot si changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |uros at kss-loka dot si http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827