"Fu, Chao-Ying" <[EMAIL PROTECTED]> writes: > After tracing GCC 4.x to see why MADD is not generated for MIPS32, > I found out the main issue is that the pattern "adddi3" > is not available for MIPS32. Because the missing > of adddi3, GCC 4.x needs to split 64-bit addition to 4 separate > RTL insns. This leads to that the combining phase fails > to combine RTL insns to a single madd pattern. > > Could we enable "adddi3" for MIPS32 in GCC 4.x? Or is there a > better way to generate MADD? Thanks a lot!
The problem with: > Ex: (mips.md in GCC 3.4) > (define_expand "adddi3" > [(parallel [(set (match_operand:DI 0 "register_operand" "") > (plus:DI (match_operand:DI 1 "register_operand" "") > (match_operand:DI 2 "arith_operand" ""))) > (clobber (match_dup 3))])] > "TARGET_64BIT || (!TARGET_DEBUG_G_MODE && !TARGET_MIPS16)" > { > .... > > (define_insn "adddi3_internal_1" > [(set (match_operand:DI 0 "register_operand" "=d,&d") > (plus:DI (match_operand:DI 1 "register_operand" "0,d") > (match_operand:DI 2 "register_operand" "d,d"))) > (clobber (match_operand:SI 3 "register_operand" "=d,d"))] > "!TARGET_64BIT && !TARGET_DEBUG_G_MODE && !TARGET_MIPS16" > { > return (REGNO (operands[0]) == REGNO (operands[1]) > && REGNO (operands[0]) == REGNO (operands[2])) > ? "srl\t%3,%L0,31\;sll\t%M0,%M0,1\;sll\t%L0,%L1,1\;addu\t%M0,%M0,%3" > : > "addu\t%L0,%L1,%L2\;sltu\t%3,%L0,%L2\;addu\t%M0,%M1,%M2\;addu\t%M0,%M0,%3"; > } > [(set_attr "type" "darith") > (set_attr "mode" "DI") > (set_attr "length" "16")]) ...this was that it tended to be very poor for the additions themselves. When optabs.c implements the additions instead, the early RTL optimisers get to see the individual instructions, and are often able to handle constant or part-constant operands better. This led to a noticable size improvement when I tested it originally. (I imagine the effects are even better now, thanks to the subreg lowering pass.) See: http://gcc.gnu.org/ml/gcc-patches/2004-05/msg00947.html for the patch that made this change, and some rationale. As far as madd goes, I think it would be better to either (a) get combine to handle this situation or (b) get expand to generate a fused multiply-add from the outset. (b) sounds like it might be useful in its own right. At the moment we treat the generation of floating-point multiply-adds as an optimisation, but in some applications it's critical not to round the intermediate result. (I don't know if there's a bugzilla entry about this.) If we treated fused multiply-add as a primitive operation, we could extend it to integer types too. In this case we'd also need to handle widening multiplications, but we already need to do that for stand-alone multiplications. Just random musings, and probably not the answer you wanted to hear, sorry. Richard