https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93565

Segher Boessenkool <segher at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |segher at gcc dot gnu.org

--- Comment #1 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Well, on power9 I get just

        cmpdi 0,3,0
        beq 0,.L2
        cnttzd 3,3
        sldi 9,3,2
        lwzx 9,4,9
        or 3,9,3
        stw 3,0(4)
.L2:
        li 3,0
        blr

so it is more than just CTZ_DEFINED_VALUE_AT_ZERO = 2 .

(Also on power7, power8, but those don't have that neat ctz insn).


On aarch64, combine starts with

insn_cost 4 for    43: r106:DI=x0:DI
      REG_DEAD x0:DI
insn_cost 4 for     2: r98:DI=r106:DI
      REG_DEAD r106:DI
insn_cost 4 for    44: r107:DI=x1:DI
      REG_DEAD x1:DI
insn_cost 4 for     3: r99:DI=r107:DI
      REG_DEAD r107:DI
insn_cost 4 for     7: cc:CC=cmp(r98:DI,0)
insn_cost 4 for     8: pc={(cc:CC==0)?L17:pc}
      REG_DEAD cc:CC
      REG_BR_PROB 536870916
insn_cost 4 for    10: r100:DI=ctz(r98:DI)
      REG_DEAD r98:DI
insn_cost 4 for    12: r101:DI=sign_extend(r100:DI#0)
insn_cost 16 for    14: r104:SI=[r101:DI*0x4+r99:DI]
      REG_DEAD r101:DI
insn_cost 4 for    15: r103:SI=r104:SI|r100:DI#0
      REG_DEAD r104:SI
      REG_DEAD r100:DI
insn_cost 4 for    16: [r99:DI]=r103:SI
      REG_DEAD r103:SI
      REG_DEAD r99:DI
insn_cost 4 for    23: x0:DI=0
insn_cost 0 for    24: use x0:DI

r100 (set in 10) is used later, just like r101 (set in 12).

Trying 10 -> 12:
   10: r100:DI=ctz(r98:DI)
      REG_DEAD r98:DI
   12: r101:DI=sign_extend(r100:DI#0)

Successfully matched this instruction:
(set (reg:DI 100)
    (ctz:DI (reg/v:DI 98 [ x ])))
Successfully matched this instruction:
(set (reg:DI 101 [ _9 ])
    (ctz:DI (reg/v:DI 98 [ x ])))
allowing combination of insns 10 and 12
original costs 4 + 4 = 8
replacement costs 4 + 4 = 8


So, it is *not* duplicating the ctz: the duplicate was already there to start
with, in some sense.

Reply via email to