https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91446

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Index: gcc/config/i386/x86-tune-costs.h
===================================================================
--- gcc/config/i386/x86-tune-costs.h    (revision 274422)
+++ gcc/config/i386/x86-tune-costs.h    (working copy)
@@ -1442,7 +1442,7 @@ struct processor_costs skylake_cost = {
   {4, 4, 4},                           /* cost of loading integer registers
                                           in QImode, HImode and SImode.
                                           Relative to reg-reg move (2).  */
-  {6, 6, 3},                           /* cost of storing integer registers */
+  {6, 6, 6},                           /* cost of storing integer registers */
   2,                                   /* cost of reg,reg fld/fst */
   {6, 6, 8},                           /* cost of loading fp registers
                                           in SFmode, DFmode and XFmode */

produces

foo:
.LFB0:
        .cfi_startproc
        vmovq   %rdi, %xmm1
        subq    $40, %rsp
        .cfi_def_cfa_offset 48
        vpinsrq $1, %rsi, %xmm1, %xmm0
        vmovq   %rdx, %xmm2
        vmovaps %xmm0, (%rsp)
        movq    %rsp, %rdi
        vpinsrq $1, %rcx, %xmm2, %xmm0
        vmovaps %xmm0, 16(%rsp)
        call    bar
        addq    $40, %rsp
        .cfi_def_cfa_offset 8
        ret

it may appear odd that we don't use AVX256, this is because of

t.i:17:3: note:   === vect_analyze_data_ref_accesses ===
t.i:17:3: note:   Detected interleaving store t.width and t.height
t.i:17:3: note:   Detected interleaving store t.x and t.y
t.i:17:3: note:   Detected interleaving store of size 2
t.i:17:3: note:         t.width = width_2(D);
t.i:17:3: note:         t.height = height_4(D);
t.i:17:3: note:   Detected interleaving store of size 2
t.i:17:3: note:         t.x = x_6(D);
t.i:17:3: note:         t.y = y_8(D);

and thus we are "confused" about the different sign of the fields which
ultimatively yields to different vector types which would make a difference
if there is sign-dependent arithmetic performed, but not in this particular
case.  On GIMPLE we'd also need nop-conversions to make the IL checker happy.
On this ground the bug would be valid but not about costs (you may want
to open a separate bug for this issue).

Reply via email to