On 2020-02-09 19:02, Segher Boessenkool wrote:
On Sun, Feb 09, 2020 at 12:15:03PM +0100, m wrote:
On 2020-02-07 16:44, Segher Boessenkool wrote:
   (define_insn "smulhshi3"
     [(set (match_operand:HI 0 "register_operand" "=r")
       (truncate:HI
         (ashiftrt:SI
           (mult:SI
             (sign_extend:SI (match_operand:HI 1
"register_operand" "r"))
             (sign_extend:SI (match_operand:HI 2
"register_operand" "r")))
           (const_int 15))))]
   "TARGET_PACKED_OPS"
   "mulq.h\\t%0, %1, %2")

However, I am unable to trigger this code path. I have tried with the
following C code:

short mulq(short op1, short op2) {
   return (short) (((int) op1 * (int) op2) >> (32 / 2 - 1));
}

But I just get the regular 4-instruction result (2x sign extend, 1x mul,
1x shift).
What does -fdump-rtl-combine-all show it tried?  *Did* it try anything?
Cool option. I'm not really sure how to read the output tough. The
closest it seems to try to match is this:
For every combination tried, it shows "Trying 2 -> 6:" etc., followed by
the instructions it started with (which is very important), and then
what worked and what didn't, and more debug information.

I usually need to see that whole block (everything until the next
"Trying:").


I've attached the full output of -fdump-rtl-combine-all.


Failed to match this instruction:
(set (reg:SI 85)
     (ashiftrt:SI (mult:SI (sign_extend:SI (subreg:HI (reg:SI 86) 0))
             (reg:SI 83 [ op2D.1381 ]))
         (const_int 15 [0xf])))


It seems that it has already decided to split the instruction into
several operations (the truncate operation is not there, and the second
sign_extend:SI (subreg:HI ...) is also missing).
The code probably sign-extends the result; I need to see the full thing
to really know.  Similarly, the sign_extend of a const_int is not
canonical rtl, it always is written as just a const_int.

You'll probably need to write a few extra patterns to recognise all the
options here.


I haven't had much time to dig more into this, but I hope to do so soon.

Regards,

  Marcus

;; Function mulq (mulq, funcdef_no=0, decl_uid=1382, cgraph_uid=1, 
symbol_order=0)


Pass statistics of "combine": ----------------

scanning new insn with uid = 21.
rescanning insn with uid = 2.
scanning new insn with uid = 22.
rescanning insn with uid = 4.
starting the processing of deferred insns
ending the processing of deferred insns
df_analyze called
df_worklist_dataflow_doublequeue: n_basic_blocks 3 n_edges 2 count 3 (    1)


mulq

Dataflow summary:
def_info->table_size = 20, use_info->table_size = 18
;;  fully invalidated by EH      1 [s1] 2 [s2] 3 [s3] 4 [s4] 5 [s5] 6 [s6] 7 
[s7] 8 [s8] 9 [s9] 10 [s10] 11 [s11] 12 [s12] 13 [s13] 14 [s14]
;;  hardware regs used   28 [sp] 64 [?fp] 65 [?ap]
;;  regular block artificial uses        26 [fp] 28 [sp] 64 [?fp] 65 [?ap]
;;  eh block artificial uses     26 [fp] 28 [sp] 64 [?fp] 65 [?ap]
;;  entry block defs     1 [s1] 2 [s2] 3 [s3] 4 [s4] 5 [s5] 6 [s6] 7 [s7] 8 
[s8] 26 [fp] 28 [sp] 30 [lr] 64 [?fp] 65 [?ap]
;;  exit block uses      1 [s1] 26 [fp] 28 [sp] 30 [lr] 64 [?fp]
;;  regs ever live       1 [s1] 2 [s2]
;;  ref usage   r1={2d,3u} r2={1d,1u} r3={1d} r4={1d} r5={1d} r6={1d} r7={1d} 
r8={1d} r26={1d,2u} r28={1d,2u} r30={1d,1u} r64={1d,2u} r65={1d,1u} r78={1d,1u} 
r80={1d,1u} r82={1d,1u} r83={1d,1u} r84={1d,1u} r85={1d,1u} r86={1d,1u} 
r87={1d,1u} 
;;    total ref usage 42{22d,20u,0e} in 10{10 regular + 0 call} insns.

( )->[0]->( 2 )
;; bb 0 artificial_defs: { d1(1){ }d2(2){ }d3(3){ }d4(4){ }d5(5){ }d6(6){ 
}d7(7){ }d8(8){ }d9(26){ }d10(28){ }d11(30){ }d12(64){ }d13(65){ }}
;; bb 0 artificial_uses: { }
;; lr  in       
;; lr  use      
;; lr  def       1 [s1] 2 [s2] 3 [s3] 4 [s4] 5 [s5] 6 [s6] 7 [s7] 8 [s8] 26 
[fp] 28 [sp] 30 [lr] 64 [?fp] 65 [?ap]
;; live  in     
;; live  gen     1 [s1] 2 [s2] 3 [s3] 4 [s4] 5 [s5] 6 [s6] 7 [s7] 8 [s8] 26 
[fp] 28 [sp] 30 [lr] 64 [?fp] 65 [?ap]
;; live  kill   
;; lr  out       1 [s1] 2 [s2] 26 [fp] 28 [sp] 30 [lr] 64 [?fp] 65 [?ap]
;; live  out     1 [s1] 2 [s2] 26 [fp] 28 [sp] 30 [lr] 64 [?fp] 65 [?ap]

( 0 )->[2]->( 1 )
;; bb 2 artificial_defs: { }
;; bb 2 artificial_uses: { u0(26){ }u1(28){ }u2(64){ }u3(65){ }}
;; lr  in        1 [s1] 2 [s2] 26 [fp] 28 [sp] 30 [lr] 64 [?fp] 65 [?ap]
;; lr  use       1 [s1] 2 [s2] 26 [fp] 28 [sp] 64 [?fp] 65 [?ap]
;; lr  def       1 [s1] 78 80 82 83 84 85 86 87
;; live  in      1 [s1] 2 [s2] 26 [fp] 28 [sp] 30 [lr] 64 [?fp] 65 [?ap]
;; live  gen     1 [s1] 78 80 82 83 84 85
;; live  kill   
;; lr  out       1 [s1] 26 [fp] 28 [sp] 30 [lr] 64 [?fp] 65 [?ap]
;; live  out     1 [s1] 26 [fp] 28 [sp] 30 [lr] 64 [?fp] 65 [?ap]

( 2 )->[1]->( )
;; bb 1 artificial_defs: { }
;; bb 1 artificial_uses: { u13(1){ }u14(26){ }u15(28){ }u16(30){ }u17(64){ }}
;; lr  in        1 [s1] 26 [fp] 28 [sp] 30 [lr] 64 [?fp]
;; lr  use       1 [s1] 26 [fp] 28 [sp] 30 [lr] 64 [?fp]
;; lr  def      
;; live  in      1 [s1] 26 [fp] 28 [sp] 30 [lr] 64 [?fp]
;; live  gen    
;; live  kill   
;; lr  out      
;; live  out    

Finding needed instructions:
  Adding insn 19 to worklist
Finished finding needed instructions:
processing block 2 lr out =  1 [s1] 26 [fp] 28 [sp] 30 [lr] 64 [?fp] 65 [?ap]
  Adding insn 18 to worklist
  Adding insn 12 to worklist
  Adding insn 11 to worklist
  Adding insn 10 to worklist
  Adding insn 9 to worklist
  Adding insn 4 to worklist
  Adding insn 22 to worklist
  Adding insn 2 to worklist
  Adding insn 21 to worklist
df_worklist_dataflow_doublequeue: n_basic_blocks 3 n_edges 2 count 3 (    1)
insn_cost 4 for    21: r86:SI=s1:SI
      REG_DEAD s1:SI
insn_cost 4 for     2: r78:SI=r86:SI
      REG_DEAD r86:SI
insn_cost 4 for    22: r87:SI=s2:SI
      REG_DEAD s2:SI
insn_cost 4 for     4: r80:SI=r87:SI
      REG_DEAD r87:SI
insn_cost 4 for     9: r82:SI=sign_extend(r78:SI#0)
      REG_DEAD r78:SI
insn_cost 4 for    10: r83:SI=sign_extend(r80:SI#0)
      REG_DEAD r80:SI
insn_cost 20 for    11: r84:SI=r82:SI*r83:SI
      REG_DEAD r83:SI
      REG_DEAD r82:SI
insn_cost 8 for    12: r85:SI=r84:SI>>0xf
      REG_DEAD r84:SI
insn_cost 4 for    18: s1:HI=r85:SI#0
      REG_DEAD r85:SI
insn_cost 0 for    19: use s1:HI

Trying 2 -> 9:
    2: r78:SI=r86:SI
      REG_DEAD r86:SI
    9: r82:SI=sign_extend(r78:SI#0)
      REG_DEAD r78:SI
Successfully matched this instruction:
(set (reg:SI 82 [ op1D.1380 ])
    (sign_extend:SI (subreg:HI (reg:SI 86) 0)))
allowing combination of insns 2 and 9
original costs 4 + 4 = 8
replacement cost 4
deferring deletion of insn with uid = 2.
modifying insn i3     9: r82:SI=sign_extend(r86:SI#0)
      REG_DEAD r86:SI
deferring rescan insn with uid = 9.

Trying 4 -> 10:
    4: r80:SI=r87:SI
      REG_DEAD r87:SI
   10: r83:SI=sign_extend(r80:SI#0)
      REG_DEAD r80:SI
Successfully matched this instruction:
(set (reg:SI 83 [ op2D.1381 ])
    (sign_extend:SI (subreg:HI (reg:SI 87) 0)))
allowing combination of insns 4 and 10
original costs 4 + 4 = 8
replacement cost 4
deferring deletion of insn with uid = 4.
modifying insn i3    10: r83:SI=sign_extend(r87:SI#0)
      REG_DEAD r87:SI
deferring rescan insn with uid = 10.

Trying 9 -> 11:
    9: r82:SI=sign_extend(r86:SI#0)
      REG_DEAD r86:SI
   11: r84:SI=r82:SI*r83:SI
      REG_DEAD r83:SI
      REG_DEAD r82:SI
Failed to match this instruction:
(set (reg:SI 84)
    (mult:SI (sign_extend:SI (subreg:HI (reg:SI 86) 0))
        (reg:SI 83 [ op2D.1381 ])))

Trying 10 -> 11:
   10: r83:SI=sign_extend(r87:SI#0)
      REG_DEAD r87:SI
   11: r84:SI=r82:SI*r83:SI
      REG_DEAD r83:SI
      REG_DEAD r82:SI
Failed to match this instruction:
(set (reg:SI 84)
    (mult:SI (sign_extend:SI (subreg:HI (reg:SI 87) 0))
        (reg:SI 82 [ op1D.1380 ])))

Trying 10, 9 -> 11:
   10: r83:SI=sign_extend(r87:SI#0)
      REG_DEAD r87:SI
    9: r82:SI=sign_extend(r86:SI#0)
      REG_DEAD r86:SI
   11: r84:SI=r82:SI*r83:SI
      REG_DEAD r83:SI
      REG_DEAD r82:SI
Failed to match this instruction:
(set (reg:SI 84)
    (mult:SI (sign_extend:SI (subreg:HI (reg:SI 87) 0))
        (sign_extend:SI (subreg:HI (reg:SI 86) 0))))
Successfully matched this instruction:
(set (reg:SI 83 [ op2D.1381 ])
    (sign_extend:SI (subreg:HI (reg:SI 86) 0)))
Failed to match this instruction:
(set (reg:SI 84)
    (mult:SI (sign_extend:SI (subreg:HI (reg:SI 87) 0))
        (reg:SI 83 [ op2D.1381 ])))

Trying 11 -> 12:
   11: r84:SI=r82:SI*r83:SI
      REG_DEAD r83:SI
      REG_DEAD r82:SI
   12: r85:SI=r84:SI>>0xf
      REG_DEAD r84:SI
Failed to match this instruction:
(set (reg:SI 85)
    (ashiftrt:SI (mult:SI (reg:SI 82 [ op1D.1380 ])
            (reg:SI 83 [ op2D.1381 ]))
        (const_int 15 [0xf])))

Trying 9, 11 -> 12:
    9: r82:SI=sign_extend(r86:SI#0)
      REG_DEAD r86:SI
   11: r84:SI=r82:SI*r83:SI
      REG_DEAD r83:SI
      REG_DEAD r82:SI
   12: r85:SI=r84:SI>>0xf
      REG_DEAD r84:SI
Failed to match this instruction:
(set (reg:SI 85)
    (ashiftrt:SI (mult:SI (sign_extend:SI (subreg:HI (reg:SI 86) 0))
            (reg:SI 83 [ op2D.1381 ]))
        (const_int 15 [0xf])))
Successfully matched this instruction:
(set (reg:SI 84)
    (sign_extend:SI (subreg:HI (reg:SI 86) 0)))
Failed to match this instruction:
(set (reg:SI 85)
    (ashiftrt:SI (mult:SI (reg:SI 84)
            (reg:SI 83 [ op2D.1381 ]))
        (const_int 15 [0xf])))

Trying 10, 11 -> 12:
   10: r83:SI=sign_extend(r87:SI#0)
      REG_DEAD r87:SI
   11: r84:SI=r82:SI*r83:SI
      REG_DEAD r83:SI
      REG_DEAD r82:SI
   12: r85:SI=r84:SI>>0xf
      REG_DEAD r84:SI
Failed to match this instruction:
(set (reg:SI 85)
    (ashiftrt:SI (mult:SI (sign_extend:SI (subreg:HI (reg:SI 87) 0))
            (reg:SI 82 [ op1D.1380 ]))
        (const_int 15 [0xf])))
Successfully matched this instruction:
(set (reg:SI 84)
    (sign_extend:SI (subreg:HI (reg:SI 87) 0)))
Failed to match this instruction:
(set (reg:SI 85)
    (ashiftrt:SI (mult:SI (reg:SI 84)
            (reg:SI 82 [ op1D.1380 ]))
        (const_int 15 [0xf])))

Trying 12 -> 18:
   12: r85:SI=r84:SI>>0xf
      REG_DEAD r84:SI
   18: s1:HI=r85:SI#0
      REG_DEAD r85:SI
Failed to match this instruction:
(set (reg:SI 1 s1)
    (zero_extract:SI (reg:SI 84)
        (const_int 16 [0x10])
        (const_int 15 [0xf])))
Failed to match this instruction:
(set (reg:SI 1 s1)
    (and:SI (lshiftrt:SI (reg:SI 84)
            (const_int 15 [0xf]))
        (const_int 65535 [0xffff])))

Trying 11, 12 -> 18:
   11: r84:SI=r82:SI*r83:SI
      REG_DEAD r83:SI
      REG_DEAD r82:SI
   12: r85:SI=r84:SI>>0xf
      REG_DEAD r84:SI
   18: s1:HI=r85:SI#0
      REG_DEAD r85:SI
Failed to match this instruction:
(set (reg:SI 1 s1)
    (zero_extract:SI (mult:SI (reg:SI 82 [ op1D.1380 ])
            (reg:SI 83 [ op2D.1381 ]))
        (const_int 16 [0x10])
        (const_int 15 [0xf])))
Failed to match this instruction:
(set (reg:SI 1 s1)
    (and:SI (lshiftrt:SI (mult:SI (reg:SI 82 [ op1D.1380 ])
                (reg:SI 83 [ op2D.1381 ]))
            (const_int 15 [0xf]))
        (const_int 65535 [0xffff])))
Successfully matched this instruction:
(set (reg:SI 85)
    (mult:SI (reg:SI 82 [ op1D.1380 ])
        (reg:SI 83 [ op2D.1381 ])))
Failed to match this instruction:
(set (reg:SI 1 s1)
    (zero_extract:SI (reg:SI 85)
        (const_int 16 [0x10])
        (const_int 15 [0xf])))
Failed to match this instruction:
(set (reg:SI 1 s1)
    (and:SI (lshiftrt:SI (reg:SI 85)
            (const_int 15 [0xf]))
        (const_int 65535 [0xffff])))

Trying 18 -> 19:
   18: s1:HI=r85:SI#0
      REG_DEAD r85:SI
   19: use s1:HI
Failed to match this instruction:
(parallel [
        (use (subreg:HI (reg:SI 85) 0))
        (set (reg/i:HI 1 s1)
            (subreg:HI (reg:SI 85) 0))
    ])
Failed to match this instruction:
(parallel [
        (use (subreg:HI (reg:SI 85) 0))
        (set (reg/i:HI 1 s1)
            (subreg:HI (reg:SI 85) 0))
    ])

Trying 12, 18 -> 19:
   12: r85:SI=r84:SI>>0xf
      REG_DEAD r84:SI
   18: s1:HI=r85:SI#0
      REG_DEAD r85:SI
   19: use s1:HI
Failed to match this instruction:
(parallel [
        (use (subreg:HI (ashiftrt:SI (reg:SI 84)
                    (const_int 15 [0xf])) 0))
        (set (reg:SI 1 s1)
            (zero_extract:SI (reg:SI 84)
                (const_int 16 [0x10])
                (const_int 15 [0xf])))
    ])
Failed to match this instruction:
(parallel [
        (use (subreg:HI (ashiftrt:SI (reg:SI 84)
                    (const_int 15 [0xf])) 0))
        (set (reg:SI 1 s1)
            (zero_extract:SI (reg:SI 84)
                (const_int 16 [0x10])
                (const_int 15 [0xf])))
    ])
Failed to match this instruction:
(parallel [
        (use (subreg:HI (ashiftrt:SI (reg:SI 84)
                    (const_int 15 [0xf])) 0))
        (set (reg:SI 1 s1)
            (and:SI (lshiftrt:SI (reg:SI 84)
                    (const_int 15 [0xf]))
                (const_int 65535 [0xffff])))
    ])
Failed to match this instruction:
(parallel [
        (use (subreg:HI (ashiftrt:SI (reg:SI 84)
                    (const_int 15 [0xf])) 0))
        (set (reg:SI 1 s1)
            (and:SI (lshiftrt:SI (reg:SI 84)
                    (const_int 15 [0xf]))
                (const_int 65535 [0xffff])))
    ])

Pass statistics of "combine": ----------------
two-insn combine: 2

starting the processing of deferred insns
rescanning insn with uid = 9.
rescanning insn with uid = 10.
ending the processing of deferred insns


mulq

Dataflow summary:
;;  fully invalidated by EH      1 [s1] 2 [s2] 3 [s3] 4 [s4] 5 [s5] 6 [s6] 7 
[s7] 8 [s8] 9 [s9] 10 [s10] 11 [s11] 12 [s12] 13 [s13] 14 [s14]
;;  hardware regs used   28 [sp] 64 [?fp] 65 [?ap]
;;  regular block artificial uses        26 [fp] 28 [sp] 64 [?fp] 65 [?ap]
;;  eh block artificial uses     26 [fp] 28 [sp] 64 [?fp] 65 [?ap]
;;  entry block defs     1 [s1] 2 [s2] 3 [s3] 4 [s4] 5 [s5] 6 [s6] 7 [s7] 8 
[s8] 26 [fp] 28 [sp] 30 [lr] 64 [?fp] 65 [?ap]
;;  exit block uses      1 [s1] 26 [fp] 28 [sp] 30 [lr] 64 [?fp]
;;  regs ever live       1 [s1] 2 [s2]
;;  ref usage   r1={2d,3u} r2={1d,1u} r3={1d} r4={1d} r5={1d} r6={1d} r7={1d} 
r8={1d} r26={1d,2u} r28={1d,2u} r30={1d,1u} r64={1d,2u} r65={1d,1u} r82={1d,1u} 
r83={1d,1u} r84={1d,1u} r85={1d,1u} r86={1d,1u} r87={1d,1u} 
;;    total ref usage 38{20d,18u,0e} in 8{8 regular + 0 call} insns.
;; basic block 2, loop depth 0, count 1073741824 (estimated locally), maybe hot
;;  prev block 0, next block 1, flags: (RTL, MODIFIED)
;;  pred:       ENTRY [always]  count:1073741824 (estimated locally) (FALLTHRU)
;; bb 2 artificial_defs: { }
;; bb 2 artificial_uses: { u0(26){ }u1(28){ }u2(64){ }u3(65){ }}
;; lr  in        1 [s1] 2 [s2] 26 [fp] 28 [sp] 30 [lr] 64 [?fp] 65 [?ap]
;; lr  use       1 [s1] 2 [s2] 26 [fp] 28 [sp] 64 [?fp] 65 [?ap]
;; lr  def       1 [s1] 78 80 82 83 84 85 86 87
;; live  in      1 [s1] 2 [s2] 26 [fp] 28 [sp] 30 [lr] 64 [?fp] 65 [?ap]
;; live  gen     1 [s1] 78 80 82 83 84 85 86 87
;; live  kill   
(note 7 0 21 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(insn 21 7 2 2 (set (reg:SI 86)
        (reg:SI 1 s1 [ op1D.1380 ])) "mulq.c":1:34 -1
     (expr_list:REG_DEAD (reg:SI 1 s1 [ op1D.1380 ])
        (nil)))
(note 2 21 22 2 NOTE_INSN_DELETED)
(insn 22 2 4 2 (set (reg:SI 87)
        (reg:SI 2 s2 [ op2D.1381 ])) "mulq.c":1:34 -1
     (expr_list:REG_DEAD (reg:SI 2 s2 [ op2D.1381 ])
        (nil)))
(note 4 22 6 2 NOTE_INSN_DELETED)
(note 6 4 9 2 NOTE_INSN_FUNCTION_BEG)
(insn 9 6 10 2 (set (reg:SI 82 [ op1D.1380 ])
        (sign_extend:SI (subreg:HI (reg:SI 86) 0))) "mulq.c":2:20 60 
{extendhisi2}
     (expr_list:REG_DEAD (reg:SI 86)
        (nil)))
(insn 10 9 11 2 (set (reg:SI 83 [ op2D.1381 ])
        (sign_extend:SI (subreg:HI (reg:SI 87) 0))) "mulq.c":2:32 60 
{extendhisi2}
     (expr_list:REG_DEAD (reg:SI 87)
        (nil)))
(insn 11 10 12 2 (set (reg:SI 84)
        (mult:SI (reg:SI 82 [ op1D.1380 ])
            (reg:SI 83 [ op2D.1381 ]))) "mulq.c":2:30 5 {mulsi3}
     (expr_list:REG_DEAD (reg:SI 83 [ op2D.1381 ])
        (expr_list:REG_DEAD (reg:SI 82 [ op1D.1380 ])
            (nil))))
(insn 12 11 18 2 (set (reg:SI 85)
        (ashiftrt:SI (reg:SI 84)
            (const_int 15 [0xf]))) "mulq.c":2:43 39 {ashrsi3}
     (expr_list:REG_DEAD (reg:SI 84)
        (nil)))
(insn 18 12 19 2 (set (reg/i:HI 1 s1)
        (subreg:HI (reg:SI 85) 0)) "mulq.c":3:1 68 {*movhi}
     (expr_list:REG_DEAD (reg:SI 85)
        (nil)))
(insn 19 18 0 2 (use (reg/i:HI 1 s1)) "mulq.c":3:1 -1
     (nil))
;;  succ:       EXIT [always]  count:1073741824 (estimated locally) (FALLTHRU)
;; lr  out       1 [s1] 26 [fp] 28 [sp] 30 [lr] 64 [?fp] 65 [?ap]
;; live  out     1 [s1] 26 [fp] 28 [sp] 30 [lr] 64 [?fp] 65 [?ap]


;; Combiner totals: 12 attempts, 12 substitutions (2 requiring new space),
;; 2 successes.

Reply via email to