https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123343
--- Comment #9 from Zhongyao Chen <chenzhongyao.hit at gmail dot com> ---
New find, seems that multi-reduction SLP group formation is the cause.
if I skip the SLP group with:
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 70470612411..425842e90d9 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -4834,7 +4834,7 @@ vect_analyze_slp_reductions (loop_vec_info loop_vinfo,
}
}
- if (scalar_stmts.length () > 1)
+ if (0 && scalar_stmts.length () > 1)
{
/* Try to form a reduction group. */
unsigned int group_size = scalar_stmts.length ();
it produces the expected efficient assembly. So we may need a cost check to
reject multi-reduction SLP groups here.