On 2020-02-09 19:02, Segher Boessenkool wrote:
On Sun, Feb 09, 2020 at 12:15:03PM +0100, m wrote:
On 2020-02-07 16:44, Segher Boessenkool wrote:
(define_insn "smulhshi3"
[(set (match_operand:HI 0 "register_operand" "=r")
(truncate:HI
(ashiftrt:SI
(mult:SI
(sign_extend:SI (match_operand:HI 1
"register_operand" "r"))
(sign_extend:SI (match_operand:HI 2
"register_operand" "r")))
(const_int 15))))]
"TARGET_PACKED_OPS"
"mulq.h\\t%0, %1, %2")
However, I am unable to trigger this code path. I have tried with the
following C code:
short mulq(short op1, short op2) {
return (short) (((int) op1 * (int) op2) >> (32 / 2 - 1));
}
But I just get the regular 4-instruction result (2x sign extend, 1x mul,
1x shift).
What does -fdump-rtl-combine-all show it tried? *Did* it try anything?
Cool option. I'm not really sure how to read the output tough. The
closest it seems to try to match is this:
For every combination tried, it shows "Trying 2 -> 6:" etc., followed by
the instructions it started with (which is very important), and then
what worked and what didn't, and more debug information.
I usually need to see that whole block (everything until the next
"Trying:").
I've attached the full output of -fdump-rtl-combine-all.
Failed to match this instruction:
(set (reg:SI 85)
(ashiftrt:SI (mult:SI (sign_extend:SI (subreg:HI (reg:SI 86) 0))
(reg:SI 83 [ op2D.1381 ]))
(const_int 15 [0xf])))
It seems that it has already decided to split the instruction into
several operations (the truncate operation is not there, and the second
sign_extend:SI (subreg:HI ...) is also missing).
The code probably sign-extends the result; I need to see the full thing
to really know. Similarly, the sign_extend of a const_int is not
canonical rtl, it always is written as just a const_int.
You'll probably need to write a few extra patterns to recognise all the
options here.
I haven't had much time to dig more into this, but I hope to do so soon.
Regards,
Marcus
;; Function mulq (mulq, funcdef_no=0, decl_uid=1382, cgraph_uid=1,
symbol_order=0)
Pass statistics of "combine": ----------------
scanning new insn with uid = 21.
rescanning insn with uid = 2.
scanning new insn with uid = 22.
rescanning insn with uid = 4.
starting the processing of deferred insns
ending the processing of deferred insns
df_analyze called
df_worklist_dataflow_doublequeue: n_basic_blocks 3 n_edges 2 count 3 ( 1)
mulq
Dataflow summary:
def_info->table_size = 20, use_info->table_size = 18
;; fully invalidated by EH 1 [s1] 2 [s2] 3 [s3] 4 [s4] 5 [s5] 6 [s6] 7
[s7] 8 [s8] 9 [s9] 10 [s10] 11 [s11] 12 [s12] 13 [s13] 14 [s14]
;; hardware regs used 28 [sp] 64 [?fp] 65 [?ap]
;; regular block artificial uses 26 [fp] 28 [sp] 64 [?fp] 65 [?ap]
;; eh block artificial uses 26 [fp] 28 [sp] 64 [?fp] 65 [?ap]
;; entry block defs 1 [s1] 2 [s2] 3 [s3] 4 [s4] 5 [s5] 6 [s6] 7 [s7] 8
[s8] 26 [fp] 28 [sp] 30 [lr] 64 [?fp] 65 [?ap]
;; exit block uses 1 [s1] 26 [fp] 28 [sp] 30 [lr] 64 [?fp]
;; regs ever live 1 [s1] 2 [s2]
;; ref usage r1={2d,3u} r2={1d,1u} r3={1d} r4={1d} r5={1d} r6={1d} r7={1d}
r8={1d} r26={1d,2u} r28={1d,2u} r30={1d,1u} r64={1d,2u} r65={1d,1u} r78={1d,1u}
r80={1d,1u} r82={1d,1u} r83={1d,1u} r84={1d,1u} r85={1d,1u} r86={1d,1u}
r87={1d,1u}
;; total ref usage 42{22d,20u,0e} in 10{10 regular + 0 call} insns.
( )->[0]->( 2 )
;; bb 0 artificial_defs: { d1(1){ }d2(2){ }d3(3){ }d4(4){ }d5(5){ }d6(6){
}d7(7){ }d8(8){ }d9(26){ }d10(28){ }d11(30){ }d12(64){ }d13(65){ }}
;; bb 0 artificial_uses: { }
;; lr in
;; lr use
;; lr def 1 [s1] 2 [s2] 3 [s3] 4 [s4] 5 [s5] 6 [s6] 7 [s7] 8 [s8] 26
[fp] 28 [sp] 30 [lr] 64 [?fp] 65 [?ap]
;; live in
;; live gen 1 [s1] 2 [s2] 3 [s3] 4 [s4] 5 [s5] 6 [s6] 7 [s7] 8 [s8] 26
[fp] 28 [sp] 30 [lr] 64 [?fp] 65 [?ap]
;; live kill
;; lr out 1 [s1] 2 [s2] 26 [fp] 28 [sp] 30 [lr] 64 [?fp] 65 [?ap]
;; live out 1 [s1] 2 [s2] 26 [fp] 28 [sp] 30 [lr] 64 [?fp] 65 [?ap]
( 0 )->[2]->( 1 )
;; bb 2 artificial_defs: { }
;; bb 2 artificial_uses: { u0(26){ }u1(28){ }u2(64){ }u3(65){ }}
;; lr in 1 [s1] 2 [s2] 26 [fp] 28 [sp] 30 [lr] 64 [?fp] 65 [?ap]
;; lr use 1 [s1] 2 [s2] 26 [fp] 28 [sp] 64 [?fp] 65 [?ap]
;; lr def 1 [s1] 78 80 82 83 84 85 86 87
;; live in 1 [s1] 2 [s2] 26 [fp] 28 [sp] 30 [lr] 64 [?fp] 65 [?ap]
;; live gen 1 [s1] 78 80 82 83 84 85
;; live kill
;; lr out 1 [s1] 26 [fp] 28 [sp] 30 [lr] 64 [?fp] 65 [?ap]
;; live out 1 [s1] 26 [fp] 28 [sp] 30 [lr] 64 [?fp] 65 [?ap]
( 2 )->[1]->( )
;; bb 1 artificial_defs: { }
;; bb 1 artificial_uses: { u13(1){ }u14(26){ }u15(28){ }u16(30){ }u17(64){ }}
;; lr in 1 [s1] 26 [fp] 28 [sp] 30 [lr] 64 [?fp]
;; lr use 1 [s1] 26 [fp] 28 [sp] 30 [lr] 64 [?fp]
;; lr def
;; live in 1 [s1] 26 [fp] 28 [sp] 30 [lr] 64 [?fp]
;; live gen
;; live kill
;; lr out
;; live out
Finding needed instructions:
Adding insn 19 to worklist
Finished finding needed instructions:
processing block 2 lr out = 1 [s1] 26 [fp] 28 [sp] 30 [lr] 64 [?fp] 65 [?ap]
Adding insn 18 to worklist
Adding insn 12 to worklist
Adding insn 11 to worklist
Adding insn 10 to worklist
Adding insn 9 to worklist
Adding insn 4 to worklist
Adding insn 22 to worklist
Adding insn 2 to worklist
Adding insn 21 to worklist
df_worklist_dataflow_doublequeue: n_basic_blocks 3 n_edges 2 count 3 ( 1)
insn_cost 4 for 21: r86:SI=s1:SI
REG_DEAD s1:SI
insn_cost 4 for 2: r78:SI=r86:SI
REG_DEAD r86:SI
insn_cost 4 for 22: r87:SI=s2:SI
REG_DEAD s2:SI
insn_cost 4 for 4: r80:SI=r87:SI
REG_DEAD r87:SI
insn_cost 4 for 9: r82:SI=sign_extend(r78:SI#0)
REG_DEAD r78:SI
insn_cost 4 for 10: r83:SI=sign_extend(r80:SI#0)
REG_DEAD r80:SI
insn_cost 20 for 11: r84:SI=r82:SI*r83:SI
REG_DEAD r83:SI
REG_DEAD r82:SI
insn_cost 8 for 12: r85:SI=r84:SI>>0xf
REG_DEAD r84:SI
insn_cost 4 for 18: s1:HI=r85:SI#0
REG_DEAD r85:SI
insn_cost 0 for 19: use s1:HI
Trying 2 -> 9:
2: r78:SI=r86:SI
REG_DEAD r86:SI
9: r82:SI=sign_extend(r78:SI#0)
REG_DEAD r78:SI
Successfully matched this instruction:
(set (reg:SI 82 [ op1D.1380 ])
(sign_extend:SI (subreg:HI (reg:SI 86) 0)))
allowing combination of insns 2 and 9
original costs 4 + 4 = 8
replacement cost 4
deferring deletion of insn with uid = 2.
modifying insn i3 9: r82:SI=sign_extend(r86:SI#0)
REG_DEAD r86:SI
deferring rescan insn with uid = 9.
Trying 4 -> 10:
4: r80:SI=r87:SI
REG_DEAD r87:SI
10: r83:SI=sign_extend(r80:SI#0)
REG_DEAD r80:SI
Successfully matched this instruction:
(set (reg:SI 83 [ op2D.1381 ])
(sign_extend:SI (subreg:HI (reg:SI 87) 0)))
allowing combination of insns 4 and 10
original costs 4 + 4 = 8
replacement cost 4
deferring deletion of insn with uid = 4.
modifying insn i3 10: r83:SI=sign_extend(r87:SI#0)
REG_DEAD r87:SI
deferring rescan insn with uid = 10.
Trying 9 -> 11:
9: r82:SI=sign_extend(r86:SI#0)
REG_DEAD r86:SI
11: r84:SI=r82:SI*r83:SI
REG_DEAD r83:SI
REG_DEAD r82:SI
Failed to match this instruction:
(set (reg:SI 84)
(mult:SI (sign_extend:SI (subreg:HI (reg:SI 86) 0))
(reg:SI 83 [ op2D.1381 ])))
Trying 10 -> 11:
10: r83:SI=sign_extend(r87:SI#0)
REG_DEAD r87:SI
11: r84:SI=r82:SI*r83:SI
REG_DEAD r83:SI
REG_DEAD r82:SI
Failed to match this instruction:
(set (reg:SI 84)
(mult:SI (sign_extend:SI (subreg:HI (reg:SI 87) 0))
(reg:SI 82 [ op1D.1380 ])))
Trying 10, 9 -> 11:
10: r83:SI=sign_extend(r87:SI#0)
REG_DEAD r87:SI
9: r82:SI=sign_extend(r86:SI#0)
REG_DEAD r86:SI
11: r84:SI=r82:SI*r83:SI
REG_DEAD r83:SI
REG_DEAD r82:SI
Failed to match this instruction:
(set (reg:SI 84)
(mult:SI (sign_extend:SI (subreg:HI (reg:SI 87) 0))
(sign_extend:SI (subreg:HI (reg:SI 86) 0))))
Successfully matched this instruction:
(set (reg:SI 83 [ op2D.1381 ])
(sign_extend:SI (subreg:HI (reg:SI 86) 0)))
Failed to match this instruction:
(set (reg:SI 84)
(mult:SI (sign_extend:SI (subreg:HI (reg:SI 87) 0))
(reg:SI 83 [ op2D.1381 ])))
Trying 11 -> 12:
11: r84:SI=r82:SI*r83:SI
REG_DEAD r83:SI
REG_DEAD r82:SI
12: r85:SI=r84:SI>>0xf
REG_DEAD r84:SI
Failed to match this instruction:
(set (reg:SI 85)
(ashiftrt:SI (mult:SI (reg:SI 82 [ op1D.1380 ])
(reg:SI 83 [ op2D.1381 ]))
(const_int 15 [0xf])))
Trying 9, 11 -> 12:
9: r82:SI=sign_extend(r86:SI#0)
REG_DEAD r86:SI
11: r84:SI=r82:SI*r83:SI
REG_DEAD r83:SI
REG_DEAD r82:SI
12: r85:SI=r84:SI>>0xf
REG_DEAD r84:SI
Failed to match this instruction:
(set (reg:SI 85)
(ashiftrt:SI (mult:SI (sign_extend:SI (subreg:HI (reg:SI 86) 0))
(reg:SI 83 [ op2D.1381 ]))
(const_int 15 [0xf])))
Successfully matched this instruction:
(set (reg:SI 84)
(sign_extend:SI (subreg:HI (reg:SI 86) 0)))
Failed to match this instruction:
(set (reg:SI 85)
(ashiftrt:SI (mult:SI (reg:SI 84)
(reg:SI 83 [ op2D.1381 ]))
(const_int 15 [0xf])))
Trying 10, 11 -> 12:
10: r83:SI=sign_extend(r87:SI#0)
REG_DEAD r87:SI
11: r84:SI=r82:SI*r83:SI
REG_DEAD r83:SI
REG_DEAD r82:SI
12: r85:SI=r84:SI>>0xf
REG_DEAD r84:SI
Failed to match this instruction:
(set (reg:SI 85)
(ashiftrt:SI (mult:SI (sign_extend:SI (subreg:HI (reg:SI 87) 0))
(reg:SI 82 [ op1D.1380 ]))
(const_int 15 [0xf])))
Successfully matched this instruction:
(set (reg:SI 84)
(sign_extend:SI (subreg:HI (reg:SI 87) 0)))
Failed to match this instruction:
(set (reg:SI 85)
(ashiftrt:SI (mult:SI (reg:SI 84)
(reg:SI 82 [ op1D.1380 ]))
(const_int 15 [0xf])))
Trying 12 -> 18:
12: r85:SI=r84:SI>>0xf
REG_DEAD r84:SI
18: s1:HI=r85:SI#0
REG_DEAD r85:SI
Failed to match this instruction:
(set (reg:SI 1 s1)
(zero_extract:SI (reg:SI 84)
(const_int 16 [0x10])
(const_int 15 [0xf])))
Failed to match this instruction:
(set (reg:SI 1 s1)
(and:SI (lshiftrt:SI (reg:SI 84)
(const_int 15 [0xf]))
(const_int 65535 [0xffff])))
Trying 11, 12 -> 18:
11: r84:SI=r82:SI*r83:SI
REG_DEAD r83:SI
REG_DEAD r82:SI
12: r85:SI=r84:SI>>0xf
REG_DEAD r84:SI
18: s1:HI=r85:SI#0
REG_DEAD r85:SI
Failed to match this instruction:
(set (reg:SI 1 s1)
(zero_extract:SI (mult:SI (reg:SI 82 [ op1D.1380 ])
(reg:SI 83 [ op2D.1381 ]))
(const_int 16 [0x10])
(const_int 15 [0xf])))
Failed to match this instruction:
(set (reg:SI 1 s1)
(and:SI (lshiftrt:SI (mult:SI (reg:SI 82 [ op1D.1380 ])
(reg:SI 83 [ op2D.1381 ]))
(const_int 15 [0xf]))
(const_int 65535 [0xffff])))
Successfully matched this instruction:
(set (reg:SI 85)
(mult:SI (reg:SI 82 [ op1D.1380 ])
(reg:SI 83 [ op2D.1381 ])))
Failed to match this instruction:
(set (reg:SI 1 s1)
(zero_extract:SI (reg:SI 85)
(const_int 16 [0x10])
(const_int 15 [0xf])))
Failed to match this instruction:
(set (reg:SI 1 s1)
(and:SI (lshiftrt:SI (reg:SI 85)
(const_int 15 [0xf]))
(const_int 65535 [0xffff])))
Trying 18 -> 19:
18: s1:HI=r85:SI#0
REG_DEAD r85:SI
19: use s1:HI
Failed to match this instruction:
(parallel [
(use (subreg:HI (reg:SI 85) 0))
(set (reg/i:HI 1 s1)
(subreg:HI (reg:SI 85) 0))
])
Failed to match this instruction:
(parallel [
(use (subreg:HI (reg:SI 85) 0))
(set (reg/i:HI 1 s1)
(subreg:HI (reg:SI 85) 0))
])
Trying 12, 18 -> 19:
12: r85:SI=r84:SI>>0xf
REG_DEAD r84:SI
18: s1:HI=r85:SI#0
REG_DEAD r85:SI
19: use s1:HI
Failed to match this instruction:
(parallel [
(use (subreg:HI (ashiftrt:SI (reg:SI 84)
(const_int 15 [0xf])) 0))
(set (reg:SI 1 s1)
(zero_extract:SI (reg:SI 84)
(const_int 16 [0x10])
(const_int 15 [0xf])))
])
Failed to match this instruction:
(parallel [
(use (subreg:HI (ashiftrt:SI (reg:SI 84)
(const_int 15 [0xf])) 0))
(set (reg:SI 1 s1)
(zero_extract:SI (reg:SI 84)
(const_int 16 [0x10])
(const_int 15 [0xf])))
])
Failed to match this instruction:
(parallel [
(use (subreg:HI (ashiftrt:SI (reg:SI 84)
(const_int 15 [0xf])) 0))
(set (reg:SI 1 s1)
(and:SI (lshiftrt:SI (reg:SI 84)
(const_int 15 [0xf]))
(const_int 65535 [0xffff])))
])
Failed to match this instruction:
(parallel [
(use (subreg:HI (ashiftrt:SI (reg:SI 84)
(const_int 15 [0xf])) 0))
(set (reg:SI 1 s1)
(and:SI (lshiftrt:SI (reg:SI 84)
(const_int 15 [0xf]))
(const_int 65535 [0xffff])))
])
Pass statistics of "combine": ----------------
two-insn combine: 2
starting the processing of deferred insns
rescanning insn with uid = 9.
rescanning insn with uid = 10.
ending the processing of deferred insns
mulq
Dataflow summary:
;; fully invalidated by EH 1 [s1] 2 [s2] 3 [s3] 4 [s4] 5 [s5] 6 [s6] 7
[s7] 8 [s8] 9 [s9] 10 [s10] 11 [s11] 12 [s12] 13 [s13] 14 [s14]
;; hardware regs used 28 [sp] 64 [?fp] 65 [?ap]
;; regular block artificial uses 26 [fp] 28 [sp] 64 [?fp] 65 [?ap]
;; eh block artificial uses 26 [fp] 28 [sp] 64 [?fp] 65 [?ap]
;; entry block defs 1 [s1] 2 [s2] 3 [s3] 4 [s4] 5 [s5] 6 [s6] 7 [s7] 8
[s8] 26 [fp] 28 [sp] 30 [lr] 64 [?fp] 65 [?ap]
;; exit block uses 1 [s1] 26 [fp] 28 [sp] 30 [lr] 64 [?fp]
;; regs ever live 1 [s1] 2 [s2]
;; ref usage r1={2d,3u} r2={1d,1u} r3={1d} r4={1d} r5={1d} r6={1d} r7={1d}
r8={1d} r26={1d,2u} r28={1d,2u} r30={1d,1u} r64={1d,2u} r65={1d,1u} r82={1d,1u}
r83={1d,1u} r84={1d,1u} r85={1d,1u} r86={1d,1u} r87={1d,1u}
;; total ref usage 38{20d,18u,0e} in 8{8 regular + 0 call} insns.
;; basic block 2, loop depth 0, count 1073741824 (estimated locally), maybe hot
;; prev block 0, next block 1, flags: (RTL, MODIFIED)
;; pred: ENTRY [always] count:1073741824 (estimated locally) (FALLTHRU)
;; bb 2 artificial_defs: { }
;; bb 2 artificial_uses: { u0(26){ }u1(28){ }u2(64){ }u3(65){ }}
;; lr in 1 [s1] 2 [s2] 26 [fp] 28 [sp] 30 [lr] 64 [?fp] 65 [?ap]
;; lr use 1 [s1] 2 [s2] 26 [fp] 28 [sp] 64 [?fp] 65 [?ap]
;; lr def 1 [s1] 78 80 82 83 84 85 86 87
;; live in 1 [s1] 2 [s2] 26 [fp] 28 [sp] 30 [lr] 64 [?fp] 65 [?ap]
;; live gen 1 [s1] 78 80 82 83 84 85 86 87
;; live kill
(note 7 0 21 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(insn 21 7 2 2 (set (reg:SI 86)
(reg:SI 1 s1 [ op1D.1380 ])) "mulq.c":1:34 -1
(expr_list:REG_DEAD (reg:SI 1 s1 [ op1D.1380 ])
(nil)))
(note 2 21 22 2 NOTE_INSN_DELETED)
(insn 22 2 4 2 (set (reg:SI 87)
(reg:SI 2 s2 [ op2D.1381 ])) "mulq.c":1:34 -1
(expr_list:REG_DEAD (reg:SI 2 s2 [ op2D.1381 ])
(nil)))
(note 4 22 6 2 NOTE_INSN_DELETED)
(note 6 4 9 2 NOTE_INSN_FUNCTION_BEG)
(insn 9 6 10 2 (set (reg:SI 82 [ op1D.1380 ])
(sign_extend:SI (subreg:HI (reg:SI 86) 0))) "mulq.c":2:20 60
{extendhisi2}
(expr_list:REG_DEAD (reg:SI 86)
(nil)))
(insn 10 9 11 2 (set (reg:SI 83 [ op2D.1381 ])
(sign_extend:SI (subreg:HI (reg:SI 87) 0))) "mulq.c":2:32 60
{extendhisi2}
(expr_list:REG_DEAD (reg:SI 87)
(nil)))
(insn 11 10 12 2 (set (reg:SI 84)
(mult:SI (reg:SI 82 [ op1D.1380 ])
(reg:SI 83 [ op2D.1381 ]))) "mulq.c":2:30 5 {mulsi3}
(expr_list:REG_DEAD (reg:SI 83 [ op2D.1381 ])
(expr_list:REG_DEAD (reg:SI 82 [ op1D.1380 ])
(nil))))
(insn 12 11 18 2 (set (reg:SI 85)
(ashiftrt:SI (reg:SI 84)
(const_int 15 [0xf]))) "mulq.c":2:43 39 {ashrsi3}
(expr_list:REG_DEAD (reg:SI 84)
(nil)))
(insn 18 12 19 2 (set (reg/i:HI 1 s1)
(subreg:HI (reg:SI 85) 0)) "mulq.c":3:1 68 {*movhi}
(expr_list:REG_DEAD (reg:SI 85)
(nil)))
(insn 19 18 0 2 (use (reg/i:HI 1 s1)) "mulq.c":3:1 -1
(nil))
;; succ: EXIT [always] count:1073741824 (estimated locally) (FALLTHRU)
;; lr out 1 [s1] 26 [fp] 28 [sp] 30 [lr] 64 [?fp] 65 [?ap]
;; live out 1 [s1] 26 [fp] 28 [sp] 30 [lr] 64 [?fp] 65 [?ap]
;; Combiner totals: 12 attempts, 12 substitutions (2 requiring new space),
;; 2 successes.