Richard Biener <rguent...@suse.de> writes: > On Wed, 28 Oct 2020, Christophe Lyon wrote: > >> On Wed, 28 Oct 2020 at 11:27, Christophe Lyon >> <christophe.l...@linaro.org> wrote: >> > >> > On Tue, 27 Oct 2020 at 13:18, Richard Biener <rguent...@suse.de> wrote: >> > > >> > > This makes SLP discovery detect backedges by seeding the bst_map with >> > > the node to be analyzed so it can be picked up from recursive calls. >> > > This removes the need to discover backedges in a separate walk. >> > > >> > > This enables SLP build to handle PHI nodes in full, continuing >> > > the SLP build to non-backedges. For loop vectorization this >> > > enables outer loop vectorization of nested SLP cycles and for >> > > BB vectorization this enables vectorization of PHIs at CFG merges. >> > > >> > > It also turns code generation into a SCC discovery walk to handle >> > > irreducible regions and nodes only reachable via backedges where >> > > we now also fill in vectorized backedge defs. >> > > >> > > This requires sanitizing the SLP tree for SLP reduction chains even >> > > more, manually filling the backedge SLP def. >> > > >> > > This also exposes the fact that CFG copying (and edge splitting >> > > until I fixed that) ends up with different edge order in the >> > > copy which doesn't play well with the desired 1:1 mapping of >> > > SLP PHI node children and edges for epilogue vectorization. >> > > I've tried to fixup CFG copying here but this really looks >> > > like a dead (or expensive) end there so I've done fixup in >> > > slpeel_tree_duplicate_loop_to_edge_cfg instead for the cases >> > > we can run into. >> > > >> > > There's still NULLs in the SLP_TREE_CHILDREN vectors and I'm >> > > not sure it's possible to eliminate them all this stage1 so the >> > > patch has quite some checks for this case all over the place. >> > > >> > > Bootstrapped and tested on x86_64-unknown-linux-gnu. SPEC CPU 2017 >> > > and SPEC CPU 2006 successfully built and tested. >> > > >> > > Will push soon. >> > > >> > > Richard. >> > > >> > > 2020-10-27 Richard Biener <rguent...@suse.de> >> > > >> > > * gimple.h (gimple_expr_type): For PHIs return the type >> > > of the result. >> > > * tree-vect-loop-manip.c >> > > (slpeel_tree_duplicate_loop_to_edge_cfg): >> > > Make sure edge order into copied loop headers line up with the >> > > originals. >> > > * tree-vect-loop.c (vect_transform_cycle_phi): Handle nested >> > > loops with SLP. >> > > (vectorizable_phi): New function. >> > > (vectorizable_live_operation): For BB vectorization compute >> > > insert >> > > location here. >> > > * tree-vect-slp.c (vect_free_slp_tree): Deal with NULL >> > > SLP_TREE_CHILDREN entries. >> > > (vect_create_new_slp_node): Add overloads with pre-existing node >> > > argument. >> > > (vect_print_slp_graph): Likewise. >> > > (vect_mark_slp_stmts): Likewise. >> > > (vect_mark_slp_stmts_relevant): Likewise. >> > > (vect_gather_slp_loads): Likewise. >> > > (vect_optimize_slp): Likewise. >> > > (vect_slp_analyze_node_operations): Likewise. >> > > (vect_bb_slp_scalar_cost): Likewise. >> > > (vect_remove_slp_scalar_calls): Likewise. >> > > (vect_get_and_check_slp_defs): Handle PHIs. >> > > (vect_build_slp_tree_1): Handle PHIs. >> > > (vect_build_slp_tree_2): Continue SLP build, following PHI >> > > arguments. Fix memory leak. >> > > (vect_build_slp_tree): Put stub node into the hash-map so >> > > we can discover cycles directly. >> > > (vect_build_slp_instance): Set the backedge SLP def for >> > > reduction chains. >> > > (vect_analyze_slp_backedges): Remove. >> > > (vect_analyze_slp): Do not call it. >> > > (vect_slp_convert_to_external): Release >> > > SLP_TREE_LOAD_PERMUTATION. >> > > (vect_slp_analyze_node_operations): Handle stray failed >> > > backedge defs by failing. >> > > (vect_slp_build_vertices): Adjust leaf condition. >> > > (vect_bb_slp_mark_live_stmts): Handle PHIs, use visited >> > > hash-set to handle cycles. >> > > (vect_slp_analyze_operations): Adjust. >> > > (vect_bb_partition_graph_r): Likewise. >> > > (vect_slp_function): Adjust split condition to allow CFG >> > > merges. >> > > (vect_schedule_slp_instance): Rename to ... >> > > (vect_schedule_slp_node): ... this. Move DFS walk to ... >> > > (vect_schedule_scc): ... this new function. >> > > (vect_schedule_slp): Call it. Remove ad-hoc vectorized >> > > backedge fill code. >> > > * tree-vect-stmts.c (vect_analyze_stmt): Call >> > > vectorizable_phi. >> > > (vect_transform_stmt): Likewise. >> > > (vect_is_simple_use): Handle vect_backedge_def. >> > > * tree-vectorizer.c (vec_info::new_stmt_vec_info): Only >> > > set loop header PHIs to vect_unknown_def_type for loop >> > > vectorization. >> > > * tree-vectorizer.h (enum vect_def_type): Add vect_backedge_def. >> > > (enum stmt_vec_info_type): Add phi_info_type. >> > > (vectorizable_phi): Declare. >> > > >> > > * gcc.dg/vect/bb-slp-54.c: New test. >> > > * gcc.dg/vect/bb-slp-55.c: Likewise. >> > > * gcc.dg/vect/bb-slp-56.c: Likewise. >> > > * gcc.dg/vect/bb-slp-57.c: Likewise. >> > > * gcc.dg/vect/bb-slp-58.c: Likewise. >> > > * gcc.dg/vect/bb-slp-59.c: Likewise. >> > > * gcc.dg/vect/bb-slp-60.c: Likewise. >> > > * gcc.dg/vect/bb-slp-61.c: Likewise. >> > > * gcc.dg/vect/bb-slp-62.c: Likewise. >> > > * gcc.dg/vect/bb-slp-63.c: Likewise. >> > > * gcc.dg/vect/bb-slp-64.c: Likewise. >> > > * gcc.dg/vect/bb-slp-65.c: Likewise. >> > > * gcc.dg/vect/bb-slp-66.c: Likewise. >> > > * gcc.dg/vect/vect-outer-slp-1.c: Likewise. >> > > * gfortran.dg/vect/O3-bb-slp-1.f: Likewise. >> > > * gfortran.dg/vect/O3-bb-slp-2.f: Likewise. >> > > * g++.dg/vect/simd-11.cc: Likewise. >> > >> > >> > I've filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97616 >> > because bb-slp-58.c and bb-slp-59.c fail on arm. >> > >> >> And it causes a regression on aarch64: >> FAIL: gcc.target/aarch64/sve/reduc_strict_4.c -march=armv8.2-a+sve >> scan-assembler-times \\tfadda\\td[0-9]+, p[0-7], d[0-9]+, z[0-9]+\\.d >> 4 > > I'm testing a fix for this one. > >> FAIL: gcc.target/aarch64/sve/reduc_strict_5.c -march=armv8.2-a+sve >> scan-assembler-times \\tfadda\\td[0-9]+, p[0-7], d[0-9]+, z[0-9]+\\.d >> 6 > > That one is more odd. For some reason we arrive at > > /home/rguenther/src/trunk/gcc/testsuite/gcc.target/aarch64/sve/reduc_strict_5.c:10:21: > > note: vect_is_simple_use: operand 0.0, type of def: constant > /home/rguenther/src/trunk/gcc/testsuite/gcc.target/aarch64/sve/reduc_strict_5.c:10:21: > > missed: Build SLP failed: invalid type of def for variable-length SLP > 0.0 > > we previously didn't analyze the initial value (we will later not > build a SLP node for it but the defs are still analyzed by > vect_get_and_check_slp_defs which I can fix). Sill I want to > know why we cannot create a vector of all zeros. The check is > > if ((dt == vect_constant_def > || dt == vect_external_def) > && !GET_MODE_SIZE (vinfo->vector_mode).is_constant () > && (TREE_CODE (type) == BOOLEAN_TYPE > || !can_duplicate_and_interleave_p (vinfo, stmts.length > (), > type))) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > "Build SLP failed: invalid type of def " > "for variable-length SLP %T\n", oprnd); > return -1; > } > > so the reason must be the group size (12)? It succeeds with 8. > But then the check looks totally premature at the above point > because for sure an all-same constant can be always duplicated? > > So - what breaks if we simply remove the above? Richard?
Well, I think there are two issues: - Should all-same constants be OK? Agree they should, but it would involve extending the check and making a promise that the values won't change later (and so invalidate the analysis). - Do we need to check a condition at all? If we don't, duplicate_and_interleave will ICE when trying to build invalid variable-length constant or external definitions. So we need to check it somewhere, even if it's not here. Thanks, Richard