Richard Biener <rguent...@suse.de> writes:
> On Wed, 28 Oct 2020, Christophe Lyon wrote:
>
>> On Wed, 28 Oct 2020 at 11:27, Christophe Lyon
>> <christophe.l...@linaro.org> wrote:
>> >
>> > On Tue, 27 Oct 2020 at 13:18, Richard Biener <rguent...@suse.de> wrote:
>> > >
>> > > This makes SLP discovery detect backedges by seeding the bst_map with
>> > > the node to be analyzed so it can be picked up from recursive calls.
>> > > This removes the need to discover backedges in a separate walk.
>> > >
>> > > This enables SLP build to handle PHI nodes in full, continuing
>> > > the SLP build to non-backedges.  For loop vectorization this
>> > > enables outer loop vectorization of nested SLP cycles and for
>> > > BB vectorization this enables vectorization of PHIs at CFG merges.
>> > >
>> > > It also turns code generation into a SCC discovery walk to handle
>> > > irreducible regions and nodes only reachable via backedges where
>> > > we now also fill in vectorized backedge defs.
>> > >
>> > > This requires sanitizing the SLP tree for SLP reduction chains even
>> > > more, manually filling the backedge SLP def.
>> > >
>> > > This also exposes the fact that CFG copying (and edge splitting
>> > > until I fixed that) ends up with different edge order in the
>> > > copy which doesn't play well with the desired 1:1 mapping of
>> > > SLP PHI node children and edges for epilogue vectorization.
>> > > I've tried to fixup CFG copying here but this really looks
>> > > like a dead (or expensive) end there so I've done fixup in
>> > > slpeel_tree_duplicate_loop_to_edge_cfg instead for the cases
>> > > we can run into.
>> > >
>> > > There's still NULLs in the SLP_TREE_CHILDREN vectors and I'm
>> > > not sure it's possible to eliminate them all this stage1 so the
>> > > patch has quite some checks for this case all over the place.
>> > >
>> > > Bootstrapped and tested on x86_64-unknown-linux-gnu.  SPEC CPU 2017
>> > > and SPEC CPU 2006 successfully built and tested.
>> > >
>> > > Will push soon.
>> > >
>> > > Richard.
>> > >
>> > > 2020-10-27  Richard Biener  <rguent...@suse.de>
>> > >
>> > >         * gimple.h (gimple_expr_type): For PHIs return the type
>> > >         of the result.
>> > >         * tree-vect-loop-manip.c 
>> > > (slpeel_tree_duplicate_loop_to_edge_cfg):
>> > >         Make sure edge order into copied loop headers line up with the
>> > >         originals.
>> > >         * tree-vect-loop.c (vect_transform_cycle_phi): Handle nested
>> > >         loops with SLP.
>> > >         (vectorizable_phi): New function.
>> > >         (vectorizable_live_operation): For BB vectorization compute 
>> > > insert
>> > >         location here.
>> > >         * tree-vect-slp.c (vect_free_slp_tree): Deal with NULL
>> > >         SLP_TREE_CHILDREN entries.
>> > >         (vect_create_new_slp_node): Add overloads with pre-existing node
>> > >         argument.
>> > >         (vect_print_slp_graph): Likewise.
>> > >         (vect_mark_slp_stmts): Likewise.
>> > >         (vect_mark_slp_stmts_relevant): Likewise.
>> > >         (vect_gather_slp_loads): Likewise.
>> > >         (vect_optimize_slp): Likewise.
>> > >         (vect_slp_analyze_node_operations): Likewise.
>> > >         (vect_bb_slp_scalar_cost): Likewise.
>> > >         (vect_remove_slp_scalar_calls): Likewise.
>> > >         (vect_get_and_check_slp_defs): Handle PHIs.
>> > >         (vect_build_slp_tree_1): Handle PHIs.
>> > >         (vect_build_slp_tree_2): Continue SLP build, following PHI
>> > >         arguments.  Fix memory leak.
>> > >         (vect_build_slp_tree): Put stub node into the hash-map so
>> > >         we can discover cycles directly.
>> > >         (vect_build_slp_instance): Set the backedge SLP def for
>> > >         reduction chains.
>> > >         (vect_analyze_slp_backedges): Remove.
>> > >         (vect_analyze_slp): Do not call it.
>> > >         (vect_slp_convert_to_external): Release 
>> > > SLP_TREE_LOAD_PERMUTATION.
>> > >         (vect_slp_analyze_node_operations): Handle stray failed
>> > >         backedge defs by failing.
>> > >         (vect_slp_build_vertices): Adjust leaf condition.
>> > >         (vect_bb_slp_mark_live_stmts): Handle PHIs, use visited
>> > >         hash-set to handle cycles.
>> > >         (vect_slp_analyze_operations): Adjust.
>> > >         (vect_bb_partition_graph_r): Likewise.
>> > >         (vect_slp_function): Adjust split condition to allow CFG
>> > >         merges.
>> > >         (vect_schedule_slp_instance): Rename to ...
>> > >         (vect_schedule_slp_node): ... this.  Move DFS walk to ...
>> > >         (vect_schedule_scc): ... this new function.
>> > >         (vect_schedule_slp): Call it.  Remove ad-hoc vectorized
>> > >         backedge fill code.
>> > >         * tree-vect-stmts.c (vect_analyze_stmt): Call
>> > >         vectorizable_phi.
>> > >         (vect_transform_stmt): Likewise.
>> > >         (vect_is_simple_use): Handle vect_backedge_def.
>> > >         * tree-vectorizer.c (vec_info::new_stmt_vec_info): Only
>> > >         set loop header PHIs to vect_unknown_def_type for loop
>> > >         vectorization.
>> > >         * tree-vectorizer.h (enum vect_def_type): Add vect_backedge_def.
>> > >         (enum stmt_vec_info_type): Add phi_info_type.
>> > >         (vectorizable_phi): Declare.
>> > >
>> > >         * gcc.dg/vect/bb-slp-54.c: New test.
>> > >         * gcc.dg/vect/bb-slp-55.c: Likewise.
>> > >         * gcc.dg/vect/bb-slp-56.c: Likewise.
>> > >         * gcc.dg/vect/bb-slp-57.c: Likewise.
>> > >         * gcc.dg/vect/bb-slp-58.c: Likewise.
>> > >         * gcc.dg/vect/bb-slp-59.c: Likewise.
>> > >         * gcc.dg/vect/bb-slp-60.c: Likewise.
>> > >         * gcc.dg/vect/bb-slp-61.c: Likewise.
>> > >         * gcc.dg/vect/bb-slp-62.c: Likewise.
>> > >         * gcc.dg/vect/bb-slp-63.c: Likewise.
>> > >         * gcc.dg/vect/bb-slp-64.c: Likewise.
>> > >         * gcc.dg/vect/bb-slp-65.c: Likewise.
>> > >         * gcc.dg/vect/bb-slp-66.c: Likewise.
>> > >         * gcc.dg/vect/vect-outer-slp-1.c: Likewise.
>> > >         * gfortran.dg/vect/O3-bb-slp-1.f: Likewise.
>> > >         * gfortran.dg/vect/O3-bb-slp-2.f: Likewise.
>> > >         * g++.dg/vect/simd-11.cc: Likewise.
>> >
>> >
>> > I've filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97616
>> > because bb-slp-58.c and bb-slp-59.c fail on arm.
>> >
>> 
>> And it causes a regression on aarch64:
>> FAIL:    gcc.target/aarch64/sve/reduc_strict_4.c -march=armv8.2-a+sve
>> scan-assembler-times \\tfadda\\td[0-9]+, p[0-7], d[0-9]+, z[0-9]+\\.d
>> 4
>
> I'm testing a fix for this one.
>
>> FAIL:    gcc.target/aarch64/sve/reduc_strict_5.c -march=armv8.2-a+sve
>> scan-assembler-times \\tfadda\\td[0-9]+, p[0-7], d[0-9]+, z[0-9]+\\.d
>> 6
>
> That one is more odd.  For some reason we arrive at
>
> /home/rguenther/src/trunk/gcc/testsuite/gcc.target/aarch64/sve/reduc_strict_5.c:10:21:
>  
> note:   vect_is_simple_use: operand 0.0, type of def: constant
> /home/rguenther/src/trunk/gcc/testsuite/gcc.target/aarch64/sve/reduc_strict_5.c:10:21:
>  
> missed:   Build SLP failed: invalid type of def for variable-length SLP 
> 0.0
>
> we previously didn't analyze the initial value (we will later not
> build a SLP node for it but the defs are still analyzed by
> vect_get_and_check_slp_defs which I can fix).  Sill I want to
> know why we cannot create a vector of all zeros.  The check is
>
>           if ((dt == vect_constant_def
>                || dt == vect_external_def)
>               && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
>               && (TREE_CODE (type) == BOOLEAN_TYPE
>                   || !can_duplicate_and_interleave_p (vinfo, stmts.length 
> (),
>                                                       type)))
>             {
>               if (dump_enabled_p ())
>                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>                                  "Build SLP failed: invalid type of def "
>                                  "for variable-length SLP %T\n", oprnd);
>               return -1;
>             }
>
> so the reason must be the group size (12)?  It succeeds with 8.
> But then the check looks totally premature at the above point
> because for sure an all-same constant can be always duplicated?
>
> So - what breaks if we simply remove the above?  Richard?

Well, I think there are two issues:

- Should all-same constants be OK?  Agree they should, but it would
  involve extending the check and making a promise that the values
  won't change later (and so invalidate the analysis).

- Do we need to check a condition at all?  If we don't,
  duplicate_and_interleave will ICE when trying to build invalid
  variable-length constant or external definitions.  So we need
  to check it somewhere, even if it's not here.

Thanks,
Richard

Reply via email to