Hi,
On Mon, 5 Mar 2007, Roman Zippel wrote:
> > Yes, it is in general better now to split double-word length
> > operations before reload. It's not necessarily better to split as
> > early as possible, as that will essentially disable the RTL level loop
> > optimizations.
>
> I was worried about that too, since some patterns would be more
> complicated, which may make some optimizations more difficult, e.g. where
> I noticed it first was mulsidi3, which currently looks like this:
Ok, I played a little more with umulsidi3 (see attached patch, only the
umulsidi3 pattern are relevant for this) and I think found a bug somwhere,
although I'm not sure whether the subreg pass causes it or only trigggers
it (it goes away with -fno-split-wide-types).
I removed the '%' from the constraint to test proper reloads (which may
not be here, but probably in other cases) and used the following testcase:
unsigned long long f(void)
{
register unsigned int a asm ("%d0");
register unsigned int b asm ("%d1");
asm ("# %0 %1" : "=d" (a), "=d" (b));
return ((unsigned long long)b * (unsigned long long)a);
}
This is the situation prior to subreg2:
(insn 32 10 30 2 (parallel [
(set (strict_low_part (subreg:SI (reg:DI 33) 4))
(mult:SI (reg/v:SI 0 %d0 [ a ])
(reg/v:SI 1 %d1 [ b ])))
(set (strict_low_part (subreg:SI (reg:DI 33) 0))
(truncate:SI (lshiftrt:DI (mult:DI (zero_extend:DI (reg/v:SI 0
%d0 [ a ]))
(zero_extend:DI (reg/v:SI 1 %d1 [ b ])))
(const_int 32 [0x20]))))
]) -1 (nil)
(expr_list:REG_DEAD (reg/v:SI 0 %d0 [ a ])
(expr_list:REG_DEAD (reg/v:SI 1 %d1 [ b ])
(nil))))
(insn 30 32 31 2 (set (reg:SI 0 %d0 [ <result> ])
(subreg:SI (reg:DI 33) 0)) 31 {*m68k.md:673} (insn_list:REG_DEP_TRUE 11
(nil))
(nil))
(insn 31 30 23 2 (set (reg:SI 1 %d1 [orig:0 <result>+4 ] [0])
(subreg:SI (reg:DI 33) 4)) 31 {*m68k.md:673} (nil)
(expr_list:REG_DEAD (reg:DI 33)
(nil)))
(insn 23 31 0 2 (use (reg/i:DI 0 %d0 [ <result> ])) -1 (insn_list:REG_DEP_TRUE
30 (nil))
(nil))
subreg2 produces this (which looks correct):
(insn 32 10 30 2 (parallel [
(set (strict_low_part (reg:SI 38 [+4 ]))
(mult:SI (reg/v:SI 0 %d0 [ a ])
(reg/v:SI 1 %d1 [ b ])))
(set (strict_low_part (reg:SI 37))
(truncate:SI (lshiftrt:DI (mult:DI (zero_extend:DI (reg/v:SI 0
%d0 [ a ]))
(zero_extend:DI (reg/v:SI 1 %d1 [ b ])))
(const_int 32 [0x20]))))
]) 181 {*umulsidi3_subreg} (nil)
(expr_list:REG_DEAD (reg/v:SI 0 %d0 [ a ])
(expr_list:REG_DEAD (reg/v:SI 1 %d1 [ b ])
(nil))))
(insn 30 32 31 2 (set (reg:SI 0 %d0 [ <result> ])
(reg:SI 37)) 31 {*m68k.md:673} (insn_list:REG_DEP_TRUE 11 (nil))
(expr_list:REG_DEAD (reg:SI 37)
(nil)))
(insn 31 30 23 2 (set (reg:SI 1 %d1 [orig:0 <result>+4 ] [0])
(reg:SI 38 [+4 ])) 31 {*m68k.md:673} (nil)
(expr_list:REG_DEAD (reg:SI 38 [+4 ])
(nil)))
(insn 23 31 0 2 (use (reg/i:DI 0 %d0 [ <result> ])) -1 (insn_list:REG_DEP_TRUE
30 (nil))
(nil))
Reload has now to match (reg %d0) and (reg 38) above in insn 32 and after
pseudo register replacement it looks like this:
(insn 32 10 30 2 (parallel [
(set (strict_low_part (reg:SI 1 %d1 [orig:38+4 ] [38]))
(mult:SI (reg/v:SI 0 %d0 [ a ])
(reg/v:SI 1 %d1 [ b ])))
(set (strict_low_part (reg:SI 0 %d0 [37]))
(truncate:SI (lshiftrt:DI (mult:DI (zero_extend:DI (reg/v:SI 0
%d0 [ a ]))
(zero_extend:DI (reg/v:SI 1 %d1 [ b ])))
(const_int 32 [0x20]))))
]) 181 {*umulsidi3_subreg} (nil)
(expr_list:REG_DEAD (reg/v:SI 0 %d0 [ a ])
(expr_list:REG_DEAD (reg/v:SI 1 %d1 [ b ])
(nil))))
Notice that the REG_DEAD notes are not correct anymore, reload produces
now the following reload:
Reloads for insn # 32
Reload 0: reload_in (SI) = (reg/v:SI 0 %d0 [ a ])
reload_out (SI) = (reg:SI 1 %d1 [orig:38+4 ] [38])
DATA_REGS, RELOAD_OTHER (opnum = 0)
reload_in_reg: (reg/v:SI 0 %d0 [ a ])
reload_out_reg: (reg:SI 1 %d1 [orig:38+4 ] [38])
reload_reg_rtx: (reg/v:SI 0 %d0 [ a ])
The problem here is it can't use %d0 as reload register (but it does it
anyway, because it thinks it's dead), as it's used for (reg 37) and reload
produces this:
(insn 32 10 34 2 (parallel [
(set (strict_low_part (reg/v:SI 0 %d0 [ a ]))
(mult:SI (reg/v:SI 0 %d0 [ a ])
(reg/v:SI 1 %d1 [ b ])))
(set (strict_low_part (reg:SI 0 %d0 [37]))
(truncate:SI (lshiftrt:DI (mult:DI (zero_extend:DI (reg/v:SI 0
%d0 [ a ]))
(zero_extend:DI (reg/v:SI 1 %d1 [ b ])))
(const_int 32 [0x20]))))
]) 181 {*umulsidi3_subreg} (nil)
(nil))
%d0 is used in both set destinations and postreload dies because of it.
The question is now, who is responsible for telling reload that %d0/%d1
are not really dead anymore after register allocation?
bye, Roman
---
gcc/config/m68k/m68k.md | 104 ++++++++++++++++++++++++------------------------
1 file changed, 54 insertions(+), 50 deletions(-)
Index: gcc/gcc/config/m68k/m68k.md
===================================================================
--- gcc.orig/gcc/config/m68k/m68k.md
+++ gcc/gcc/config/m68k/m68k.md
@@ -2672,77 +2672,81 @@
;; proper matching constraint. This is because the matching is between
;; the high-numbered word of the DImode operand[0] and operand[1].
(define_expand "umulsidi3"
+ [(set (match_operand:DI 0 "register_operand" "")
+ (mult:DI (zero_extend:DI (match_operand:SI 1 "register_operand" ""))
+ (zero_extend:DI (match_operand:SI 2 "register_operand" ""))))]
+ "TARGET_68020 && !TUNE_68060 && !TARGET_COLDFIRE"
+ "")
+
+(define_insn_and_split "*umulsidi3"
+ [(set (match_operand:DI 0 "register_operand" "=d")
+ (mult:DI (zero_extend:DI (match_operand:SI 1 "register_operand" "%0"))
+ (zero_extend:DI (match_operand:SI 2 "nonimmediate_operand"
"dm"))))]
+ "TARGET_68020 && !TUNE_68060 && !TARGET_COLDFIRE"
+ "#"
+ ""
[(parallel
- [(set (subreg:SI (match_operand:DI 0 "register_operand" "") 4)
- (mult:SI (match_operand:SI 1 "register_operand" "")
- (match_operand:SI 2 "register_operand" "")))
- (set (subreg:SI (match_dup 0) 0)
+ [(set (strict_low_part (match_dup 3))
+ (mult:SI (match_dup 1) (match_dup 2)))
+ (set (strict_low_part (match_dup 4))
(truncate:SI (lshiftrt:DI (mult:DI (zero_extend:DI (match_dup 1))
(zero_extend:DI (match_dup 2)))
(const_int 32))))])]
- "TARGET_68020 && !TUNE_68060 && !TARGET_COLDFIRE"
- "")
+ "split_di(operands, 1, operands + 3, operands + 4);")
-(define_insn ""
- [(set (match_operand:SI 0 "register_operand" "=d")
- (mult:SI (match_operand:SI 1 "register_operand" "%0")
- (match_operand:SI 2 "nonimmediate_operand" "dm")))
- (set (match_operand:SI 3 "register_operand" "=d")
- (truncate:SI (lshiftrt:DI (mult:DI (zero_extend:DI (match_dup 1))
- (zero_extend:DI (match_dup 2)))
+(define_insn "*umulsidi3_subreg"
+ [(set (strict_low_part (match_operand:SI 0 "register_operand" "=d"))
+ (mult:SI (match_operand:SI 2 "register_operand" "0")
+ (match_operand:SI 3 "nonimmediate_operand" "dm")))
+ (set (strict_low_part (match_operand:SI 1 "register_operand" "+d"))
+ (truncate:SI (lshiftrt:DI (mult:DI (zero_extend:DI (match_dup 2))
+ (zero_extend:DI (match_dup 3)))
(const_int 32))))]
"TARGET_68020 && !TUNE_68060 && !TARGET_COLDFIRE"
- "mulu%.l %2,%3:%0")
+ "mulu%.l %3,%1:%0")
; Match immediate case. For 2.4 only match things < 2^31.
; It's tricky with larger values in these patterns since we need to match
; values between the two parallel multiplies, between a CONST_DOUBLE and
; a CONST_INT.
-(define_insn ""
- [(set (match_operand:SI 0 "register_operand" "=d")
- (mult:SI (match_operand:SI 1 "register_operand" "%0")
- (match_operand:SI 2 "const_int_operand" "n")))
- (set (match_operand:SI 3 "register_operand" "=d")
- (truncate:SI (lshiftrt:DI (mult:DI (zero_extend:DI (match_dup 1))
- (match_dup 2))
- (const_int 32))))]
+;;(define_insn "*umulsidi3_const"
+;; [(set (match_operand:DI 0 "register_operand" "=d")
+;; (mult:DI (zero_extend:DI (match_operand:SI 1 "register_operand" "0"))
+;; (match_operand:SI 2 "const_int_operand" "n")))]
+;; "TARGET_68020 && !TUNE_68060 && !TARGET_COLDFIRE"
+;; "mulu%.l %2,%0:%R0")
+
+(define_insn "*umulsidi3_const"
+ [(set (match_operand:DI 0 "register_operand" "=d")
+ (mult:DI (zero_extend:DI (match_operand:SI 1 "register_operand" "%0"))
+ (match_operand:DI 2 "const_double_operand" "n")))]
"TARGET_68020 && !TUNE_68060 && !TARGET_COLDFIRE
- && (unsigned) INTVAL (operands[2]) <= 0x7fffffff"
- "mulu%.l %2,%3:%0")
+ && ((CONST_INT_P (operands[2])
+ && INTVAL (operands[2]) >= 0)
+ || (GET_CODE (operands[2]) == CONST_DOUBLE
+ && CONST_DOUBLE_HIGH (operands[2]) == 0))"
+ "mulu%.l %2,%0:%R0")
(define_expand "mulsidi3"
- [(parallel
- [(set (subreg:SI (match_operand:DI 0 "register_operand" "") 4)
- (mult:SI (match_operand:SI 1 "register_operand" "")
- (match_operand:SI 2 "register_operand" "")))
- (set (subreg:SI (match_dup 0) 0)
- (truncate:SI (lshiftrt:DI (mult:DI (sign_extend:DI (match_dup 1))
- (sign_extend:DI (match_dup 2)))
- (const_int 32))))])]
+ [(set (match_operand:DI 0 "register_operand" "")
+ (mult:DI (sign_extend:DI (match_operand:SI 1 "register_operand" ""))
+ (sign_extend:DI (match_operand:SI 2 "register_operand" ""))))]
"TARGET_68020 && !TUNE_68060 && !TARGET_COLDFIRE"
"")
-(define_insn ""
- [(set (match_operand:SI 0 "register_operand" "=d")
- (mult:SI (match_operand:SI 1 "register_operand" "%0")
- (match_operand:SI 2 "nonimmediate_operand" "dm")))
- (set (match_operand:SI 3 "register_operand" "=d")
- (truncate:SI (lshiftrt:DI (mult:DI (sign_extend:DI (match_dup 1))
- (sign_extend:DI (match_dup 2)))
- (const_int 32))))]
+(define_insn "*mulsidi3"
+ [(set (match_operand:DI 0 "register_operand" "=d")
+ (mult:DI (sign_extend:DI (match_operand:SI 1 "register_operand" "%0"))
+ (sign_extend:DI (match_operand:SI 2 "nonimmediate_operand"
"dm"))))]
"TARGET_68020 && !TUNE_68060 && !TARGET_COLDFIRE"
- "muls%.l %2,%3:%0")
+ "muls%.l %2,%0:%R0")
-(define_insn ""
- [(set (match_operand:SI 0 "register_operand" "=d")
- (mult:SI (match_operand:SI 1 "register_operand" "%0")
- (match_operand:SI 2 "const_int_operand" "n")))
- (set (match_operand:SI 3 "register_operand" "=d")
- (truncate:SI (lshiftrt:DI (mult:DI (sign_extend:DI (match_dup 1))
- (match_dup 2))
- (const_int 32))))]
+(define_insn "*mulsidi3_const"
+ [(set (match_operand:DI 0 "register_operand" "=d")
+ (mult:DI (sign_extend:DI (match_operand:SI 1 "register_operand" "%0"))
+ (match_operand:SI 2 "const_int_operand" "n")))]
"TARGET_68020 && !TUNE_68060 && !TARGET_COLDFIRE"
- "muls%.l %2,%3:%0")
+ "muls%.l %2,%0:%R0")
(define_expand "umulsi3_highpart"
[(parallel