This fixes PR56270 - when two SLP instances share a load we fail to properly initialize the vectorized SLP stmts. In fact, there is no support in SLP to "share" vectorized code between instances - the only support that's explicitely wired in is to have different permutations of a load inside a single SLP instance.
Without digging further into cross-instance SLP vectorized code sharing (or even intra-instance sharing) the following avoids the special early-out cross-instance by resetting the instances loads vector-stmts after scheduling it. Another fix would possibly to do /* Check if the chain of loads is already vectorized. */ if (STMT_VINFO_VEC_STMT (vinfo_for_stmt (first_stmt))) { *vec_stmt = STMT_VINFO_VEC_STMT (stmt_info); return true; } only for !slp and for SLP figure out an equivalent condition using SLP_TREE_VEC_STMTS. Jakub, do you have any insights into this code, especially what kind of permutations we support? Bootstrap and regtest running on x86_64-unknown-linux-gnu. Thanks, Richard. 2013-03-04 Richard Biener <rguent...@suse.de> PR tree-optimization/56270 * tree-vect-slp.c (vect_schedule_slp): Clear vectorized stmts of loads after scheduling an SLP instance. * gcc.dg/vect/slp-38.c: New testcase. Index: gcc/tree-vect-slp.c =================================================================== *** gcc/tree-vect-slp.c (revision 196423) --- gcc/tree-vect-slp.c (working copy) *************** vect_schedule_slp (loop_vec_info loop_vi *** 3138,3144 **** { vec<slp_instance> slp_instances; slp_instance instance; ! unsigned int i, vf; bool is_store = false; if (loop_vinfo) --- 3148,3155 ---- { vec<slp_instance> slp_instances; slp_instance instance; ! slp_tree loads_node; ! unsigned int i, j, vf; bool is_store = false; if (loop_vinfo) *************** vect_schedule_slp (loop_vec_info loop_vi *** 3157,3162 **** --- 3168,3181 ---- /* Schedule the tree of INSTANCE. */ is_store = vect_schedule_slp_instance (SLP_INSTANCE_TREE (instance), instance, vf); + + /* Clear STMT_VINFO_VEC_STMT of all loads. With shared loads + between SLP instances we fail to properly initialize the + vectorized SLP stmts and confuse different load permutations. */ + FOR_EACH_VEC_ELT (SLP_INSTANCE_LOADS (instance), j, loads_node) + STMT_VINFO_VEC_STMT + (vinfo_for_stmt (SLP_TREE_SCALAR_STMTS (loads_node)[0])) = NULL; + if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "vectorizing stmts using SLP."); Index: gcc/testsuite/gcc.dg/vect/slp-38.c =================================================================== *** gcc/testsuite/gcc.dg/vect/slp-38.c (revision 0) --- gcc/testsuite/gcc.dg/vect/slp-38.c (working copy) *************** *** 0 **** --- 1,24 ---- + /* { dg-do compile } */ + + typedef struct { + float l, h; + } tFPinterval; + + tFPinterval X[1024]; + tFPinterval Y[1024]; + tFPinterval Z[1024]; + + void Compute(void) + { + int d; + for (d= 0; d < 1024; d++) + { + Y[d].l= X[d].l + X[d].h; + Y[d].h= Y[d].l; + Z[d].l= X[d].l; + Z[d].h= X[d].h; + } + } + + /* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 "vect" { target { vect_float && vect_perm } } } } */ + /* { dg-final { cleanup-tree-dump "vect" } } */