On 11/10/2009 05:48 AM, Mohamed Shafi wrote:
(define_insn "mulsi3"
[(set (match_operand:SI 0 "register_operand" "=&d")
(mult:SI (match_operand:SI 1 "register_operand" "%d")
(match_operand:SI 2 "register_operand" "d")))]
Note that "%" is only useful if the constraints for the two operands are
different (e.g. only one operand accepts an immediate input). When
they're identical, you simply waste cpu cycles asking reload to try the
operands in the other order.
[(set (match_dup 0)
(ashift:SI
(plus:SI (mult:HI (unspec:HI [(match_dup 2)] UNSPEC_REG_LOW)
(unspec:HI [(match_dup 1)] UNSPEC_REG_HIGH))
(mult:HI (unspec:HI [(match_dup 2)] UNSPEC_REG_HIGH)
(unspec:HI [(match_dup 1)] UNSPEC_REG_LOW)))
(const_int 16)))
(set (match_dup 0)
(plus:SI (match_dup 0)
(mult:HI (unspec:HI [(match_dup 2)] UNSPEC_REG_LOW)
(unspec:HI [(match_dup 1)] UNSPEC_REG_LOW))))]
Well for one, your modes don't match. You actually want your unspecs
and MULTs to be SImode.
You could probably usefully model the second insn as
(define_insn "mulsi3_part2"
[(set (match_operand:SI 0 "register_operand" "=d")
(plus:SI
(mult:SI (zero_extend:SI
(match_operand:HI 1 "register_operand" "d"))
(zero_extend:SI
(match_operand:HI 2 "register_operand" "d")))
(match_operand:SI 3 "register_operand" "0")))]
""
...)
(define_expand "mulhisi3"
[(set (match_operand:SI 0 "register_operand" "")
(mult:SI (zero_extend:SI
(match_operand:HI 1 "register_operand" ""))
(zero_extend:SI
(match_operand:HI 2 "register_operand" "")))))]
""
{
emit_insn (gen_mulsi3_part2 (operands[0], operands[1], operands[2],
force_reg (SImode, const0_rtx)));
DONE;
})
The first insn *could* be modeled without unspec, but its general
utility is questionable. On the other hand, it doesn't hurt.
(define_insn "mulsi3_part1"
[(set (match_operand:SI 0 "register_operand" "=d")
(ashift:SI
(plus:SI
(mult:SI
(zero_extract:SI
(match_operand:SI 1 "register_operand" "d")
(const_int 16)
(const_int 0))
(zero_extract:SI
(match_operand:SI 2 "register_operand" "d")
(const_int 16)
(const_int 16)))
(mult:SI
(zero_extract:SI
(match_dup 1)
(const_int 16)
(const_int 16))
(zero_extract:SI
(match_dup 2)
(const_int 16)
(const_int 0))))
(const_int 16)))]
""
...)
It does appear that you could help the register allocator out by
splitting this pattern before reload. I would hope that the gimple
optimizers are good enough that you'd get good code simply emitting the
two insns immediately at expand time, but I can imagine that there are
places in the rtl optimizers that would be unhappy not being able to
generate a plain multiply pattern. So retaining the splitter is likely
to be best.
But beyond that we can't help without knowing what "creating problems"
means.
r~