On Sat, Aug 20, 2011 at 11:52 PM, Richard Henderson <[email protected]> wrote:
> On 08/20/2011 02:16 PM, Uros Bizjak wrote:
>> +(define_insn "bmi2_umul<mode><dwi>3_1"
>> + [(set (match_operand:<DWI> 0 "register_operand" "=r")
>> + (mult:<DWI>
>> + (zero_extend:<DWI>
>> + (match_operand:DWIH 1 "nonimmediate_operand" "%d"))
>> + (zero_extend:<DWI>
>> + (match_operand:DWIH 2 "nonimmediate_operand" "rm"))))]
>> + "TARGET_BMI
>> + && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
>> + "mulx\t{%2, %M0, %N0|%N0, %M0, %2}"
>> + [(set_attr "type" "imul")
>> + (set_attr "prefix" "vex")
>> + (set_attr "mode" "<MODE>")])
>
> You can do better than this, and avoid the %M %N specifiers.
> The outputs are truly independent and do not need to be a pair.
>
> See the mn10300 umulsidi3{,_internal} patterns.
I have tried your suggestion, using patterns like following:
(define_insn "umulsidi3_1"
[(set (match_operand:SI 0 "register_operand" "=a,r")
(mult:SI
(match_operand:SI 2 "nonimmediate_operand" "%0,d")
(match_operand:SI 3 "nonimmediate_operand" "rm,rm")))
(set (match_operand:SI 1 "register_operand" "=d,r")
(truncate:SI
(lshiftrt:DI
(mult:DI (zero_extend:DI (match_dup 2))
(zero_extend:DI (match_dup 3)))
(const_int 32))))
(clobber (reg:CC FLAGS_REG))]
"!TARGET_64BIT
&& !(MEM_P (operands[2]) && MEM_P (operands[3]))"
"@
mull\t%3
#"
[(set_attr "isa" "base,bmi2")
(set_attr "type" "imul,imulx")
(set_attr "length_immediate" "0,*")
(set (attr "athlon_decode")
(cond [(eq_attr "alternative" "0")
(if_then_else (eq_attr "cpu" "athlon")
(const_string "vector")
(const_string "double"))]
(const_string "*")))
(set_attr "amdfam10_decode" "double,*")
(set_attr "bdver1_decode" "direct,*")
(set_attr "prefix" "orig,vex")
(set_attr "mode" "SI")])
The compiler works, for a couple of simple testcases it produces the
same code as with register pairs. However, there are a couple of
problems:
- various length calculations look into operand{0,1,2} to determine
instruction length. This is fixable with a little effort.
- patterns that include (const_int N) do not macroize and this leads
to pattern explosion. For this simple example, in addition to
splitting out any_extend pattern, we have to split also DWIH
patterns.
In the past, I have tried to use match_operand with const_int INTVAL
predicates, but gcc crashed elsewhere due to additional operand.
Please see [1].
IMO, it is currently too much pain to implement splitted pairs in
existing patterns for too low gain. I will however implement split to
mulx pattern after reload to proposed pattern to avoid %M %N.
[1] http://gcc.gnu.org/ml/gcc/2010-07/msg00143.html
Uros.