https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101929
--- Comment #8 from Hongtao.liu <crazylht at gmail dot com> --- (In reply to Richard Biener from comment #7) > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc > index 9188d727e33..7f1f12fb6c6 100644 > --- a/gcc/tree-vect-slp.cc > +++ b/gcc/tree-vect-slp.cc > @@ -2374,7 +2375,7 @@ fail: > n_vector_builds++; > } > } > - if (all_uniform_p > + if ((all_uniform_p && !two_operators) > || n_vector_builds > 1 > || (n_vector_builds == children.length () > && is_a <gphi *> (stmt_info->stmt))) > > > will re-enable the vectorization - it evades the vect_construct cost bump > by instead using scalar_to_vec (aka splat) which has not yet been fixed to > account for a possible gpr to xmm move (so it would be a temporary "solution" > at best). > > Another change to mute the effect somewhat (but not fixing x264) that was > mentioned is > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > index b2bf90576d5..acf2cc977b4 100644 > --- a/gcc/config/i386/i386.cc > +++ b/gcc/config/i386/i386.cc > @@ -22595,7 +22595,7 @@ ix86_builtin_vectorization_cost (enum > vect_cost_for_stmt type_of_cost, > case vec_construct: > { > /* N element inserts into SSE vectors. */ > - int cost = TYPE_VECTOR_SUBPARTS (vectype) * ix86_cost->sse_op; > + int cost = (TYPE_VECTOR_SUBPARTS (vectype) - 1) * > ix86_cost->sse_op; (In reply to Richard Biener from comment #7) > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc > index 9188d727e33..7f1f12fb6c6 100644 > --- a/gcc/tree-vect-slp.cc > +++ b/gcc/tree-vect-slp.cc > @@ -2374,7 +2375,7 @@ fail: > n_vector_builds++; > } > } > - if (all_uniform_p > + if ((all_uniform_p && !two_operators) > || n_vector_builds > 1 > || (n_vector_builds == children.length () > && is_a <gphi *> (stmt_info->stmt))) > > > will re-enable the vectorization - it evades the vect_construct cost bump > by instead using scalar_to_vec (aka splat) which has not yet been fixed to > account for a possible gpr to xmm move (so it would be a temporary "solution" > at best). > > Another change to mute the effect somewhat (but not fixing x264) that was > mentioned is > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > index b2bf90576d5..acf2cc977b4 100644 > --- a/gcc/config/i386/i386.cc > +++ b/gcc/config/i386/i386.cc > @@ -22595,7 +22595,7 @@ ix86_builtin_vectorization_cost (enum > vect_cost_for_stmt type_of_cost, > case vec_construct: > { > /* N element inserts into SSE vectors. */ > - int cost = TYPE_VECTOR_SUBPARTS (vectype) * ix86_cost->sse_op; > + int cost = (TYPE_VECTOR_SUBPARTS (vectype) - 1) * > ix86_cost->sse_op; n - 1 is right for 128-bit vector, but for 256-bit vector, shouldn't it be n - 2, since we have a separate cost for vinserti128, and n - 4 for 512-bit one.