https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89114

            Bug ID: 89114
           Summary: rtx_cost of VEC_SELECT, VEC_CONCAT and VEC_DUPLICATE
                    with memory operands is wrong
           Product: gcc
           Version: 9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

Split out from PR89049.  On its testcase combine is willing to elide an
unnecessary %ymm build-up but the targets RTX cost makes that not profitable.

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89049#c5

So with (the bogus)

Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c      (revision 268383)
+++ gcc/config/i386/i386.c      (working copy)
@@ -40848,7 +40848,7 @@ ix86_rtx_costs (rtx x, machine_mode mode
         recognizable.  In which case they all pretty much have the
         same cost.  */
      *total = cost->sse_op;
-     return true;
+     return false;
     case VEC_MERGE:
       mask = XEXP (x, 2);
       /* This is masked instruction, assume the same cost,

we get combine to do

Trying 11 -> 25:
   11: r105:V8SF=vec_concat(r106:V4SF,[r85:DI+0x10])
   25: r111:V4SF=vec_select(r105:V8SF,parallel)
      REG_DEAD r105:V8SF
Successfully matched this instruction:
(set (reg:V4SF 111)
    (mem:V4SF (plus:DI (reg:DI 85 [ ivtmp.11 ])
            (const_int 16 [0x10])) [1 MEM[base: _2, offset: 0B]+16 S16 A32]))
allowing combination of insns 11 and 25
original costs 16 + 12 = 28
replacement cost 12

and we elide the %ymm build:

.L2:
        vmovups (%rdi), %xmm1
        addq    $32, %rdi
        vaddss  %xmm1, %xmm0, %xmm0
        vshufps $85, %xmm1, %xmm1, %xmm2
        vaddss  %xmm2, %xmm0, %xmm0
        vunpckhps       %xmm1, %xmm1, %xmm2
        vshufps $255, %xmm1, %xmm1, %xmm1
        vaddss  %xmm2, %xmm0, %xmm0
        vaddss  %xmm1, %xmm0, %xmm0
        vmovups -16(%rdi), %xmm1
        vshufps $85, %xmm1, %xmm1, %xmm2
        vaddss  %xmm1, %xmm0, %xmm0
        vaddss  %xmm2, %xmm0, %xmm0
        vunpckhps       %xmm1, %xmm1, %xmm2
        vshufps $255, %xmm1, %xmm1, %xmm1
        vaddss  %xmm2, %xmm0, %xmm0
        vaddss  %xmm1, %xmm0, %xmm0
        cmpq    %rdi, %rax
        jne     .L2

the patch is bogus because the intention of not scanning sub-rtxen was
to match the various shuffle patterns which do sth like
(vec_select (vec_concat ..) ...).

Not sure if there's a helper in i386.c to extract/cost a single MEM
sub-rtx, but the course of action would be to properly do this
somehow.

Reply via email to