https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91446
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- Index: gcc/config/i386/x86-tune-costs.h =================================================================== --- gcc/config/i386/x86-tune-costs.h (revision 274422) +++ gcc/config/i386/x86-tune-costs.h (working copy) @@ -1442,7 +1442,7 @@ struct processor_costs skylake_cost = { {4, 4, 4}, /* cost of loading integer registers in QImode, HImode and SImode. Relative to reg-reg move (2). */ - {6, 6, 3}, /* cost of storing integer registers */ + {6, 6, 6}, /* cost of storing integer registers */ 2, /* cost of reg,reg fld/fst */ {6, 6, 8}, /* cost of loading fp registers in SFmode, DFmode and XFmode */ produces foo: .LFB0: .cfi_startproc vmovq %rdi, %xmm1 subq $40, %rsp .cfi_def_cfa_offset 48 vpinsrq $1, %rsi, %xmm1, %xmm0 vmovq %rdx, %xmm2 vmovaps %xmm0, (%rsp) movq %rsp, %rdi vpinsrq $1, %rcx, %xmm2, %xmm0 vmovaps %xmm0, 16(%rsp) call bar addq $40, %rsp .cfi_def_cfa_offset 8 ret it may appear odd that we don't use AVX256, this is because of t.i:17:3: note: === vect_analyze_data_ref_accesses === t.i:17:3: note: Detected interleaving store t.width and t.height t.i:17:3: note: Detected interleaving store t.x and t.y t.i:17:3: note: Detected interleaving store of size 2 t.i:17:3: note: t.width = width_2(D); t.i:17:3: note: t.height = height_4(D); t.i:17:3: note: Detected interleaving store of size 2 t.i:17:3: note: t.x = x_6(D); t.i:17:3: note: t.y = y_8(D); and thus we are "confused" about the different sign of the fields which ultimatively yields to different vector types which would make a difference if there is sign-dependent arithmetic performed, but not in this particular case. On GIMPLE we'd also need nop-conversions to make the IL checker happy. On this ground the bug would be valid but not about costs (you may want to open a separate bug for this issue).