On Sun, 10 Jul 2016, Uros Bizjak wrote: > On Wed, Jul 6, 2016 at 3:18 PM, Richard Biener <rguent...@suse.de> wrote: > > >> > 2016-07-04 Richard Biener <rguent...@suse.de> > >> > > >> > PR rtl-optimization/68961 > >> > * fwprop.c (propagate_rtx): Allow SUBREGs of VEC_CONCAT and CONCAT > >> > to simplify to a non-constant. > >> > > >> > * gcc.target/i386/pr68961.c: New testcase. > >> > >> Thanks, LGTM. > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, it causes > > > > FAIL: gcc.target/i386/sse2-load-multi.c scan-assembler-times movup 2 > > > > as the peephole created for that testcase no longer applies as fwprop > > does > > > > In insn 10, replacing > > (vec_concat:V2DF (vec_select:DF (reg:V2DF 91) > > (parallel [ > > (const_int 0 [0]) > > ])) > > (mem:DF (reg/f:DI 95) [0 S8 A128])) > > with (vec_concat:V2DF (reg:DF 93 [ MEM[(const double *)&a + 8B] ]) > > (mem:DF (reg/f:DI 95) [0 S8 A128])) > > Changed insn 10 > > > > resulting in > > > > movsd a+8(%rip), %xmm0 > > movhpd a+16(%rip), %xmm0 > > > > again rather than movupd. > > > > Uros, there is probably a missing peephole for the new form - can you > > fix this as a followup or should I hold on this patch for a bit longer? > > No, please proceed with the patch, I'll fix this fallout with a > followup patch in a couple of days.
Applied as r238238. Is the following x86 change ok then which adjusts the vectorizer vector construction cost to sth more sensible? I have adjusted the generic implementation in targhooks.c this way a few weeks ago already. Thanks, Richard. 2016-07-12 Richard Biener <rguent...@suse.de> * targhooks.c (default_builtin_vectorization_cost): Adjust vec_construct cost. * config/i386/i386.c (ix86_builtin_vectorization_cost): Likewise. Index: gcc/config/i386/i386.c =================================================================== --- gcc/config/i386/i386.c (revision 237196) +++ gcc/config/i386/i386.c (working copy) @@ -49503,8 +49520,6 @@ static int ix86_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost, tree vectype, int) { - unsigned elements; - switch (type_of_cost) { case scalar_stmt: @@ -49546,8 +49561,7 @@ ix86_builtin_vectorization_cost (enum ve return ix86_cost->vec_stmt_cost; case vec_construct: - elements = TYPE_VECTOR_SUBPARTS (vectype); - return ix86_cost->vec_stmt_cost * (elements / 2 + 1); + return ix86_cost->vec_stmt_cost * (TYPE_VECTOR_SUBPARTS (vectype) - 1); default: gcc_unreachable ();