On Fri, 30 Oct 2020, Richard Sandiford wrote:

> Richard Biener <rguent...@suse.de> writes:
> > On Wed, 28 Oct 2020, Christophe Lyon wrote:
> >
> >> On Wed, 28 Oct 2020 at 11:27, Christophe Lyon
> >> <christophe.l...@linaro.org> wrote:
> >> >
> >> > On Tue, 27 Oct 2020 at 13:18, Richard Biener <rguent...@suse.de> wrote:
> >> > >
> >> > > This makes SLP discovery detect backedges by seeding the bst_map with
> >> > > the node to be analyzed so it can be picked up from recursive calls.
> >> > > This removes the need to discover backedges in a separate walk.
> >> > >
> >> > > This enables SLP build to handle PHI nodes in full, continuing
> >> > > the SLP build to non-backedges.  For loop vectorization this
> >> > > enables outer loop vectorization of nested SLP cycles and for
> >> > > BB vectorization this enables vectorization of PHIs at CFG merges.
> >> > >
> >> > > It also turns code generation into a SCC discovery walk to handle
> >> > > irreducible regions and nodes only reachable via backedges where
> >> > > we now also fill in vectorized backedge defs.
> >> > >
> >> > > This requires sanitizing the SLP tree for SLP reduction chains even
> >> > > more, manually filling the backedge SLP def.
> >> > >
> >> > > This also exposes the fact that CFG copying (and edge splitting
> >> > > until I fixed that) ends up with different edge order in the
> >> > > copy which doesn't play well with the desired 1:1 mapping of
> >> > > SLP PHI node children and edges for epilogue vectorization.
> >> > > I've tried to fixup CFG copying here but this really looks
> >> > > like a dead (or expensive) end there so I've done fixup in
> >> > > slpeel_tree_duplicate_loop_to_edge_cfg instead for the cases
> >> > > we can run into.
> >> > >
> >> > > There's still NULLs in the SLP_TREE_CHILDREN vectors and I'm
> >> > > not sure it's possible to eliminate them all this stage1 so the
> >> > > patch has quite some checks for this case all over the place.
> >> > >
> >> > > Bootstrapped and tested on x86_64-unknown-linux-gnu.  SPEC CPU 2017
> >> > > and SPEC CPU 2006 successfully built and tested.
> >> > >
> >> > > Will push soon.
> >> > >
> >> > > Richard.
> >> > >
> >> > > 2020-10-27  Richard Biener  <rguent...@suse.de>
> >> > >
> >> > >         * gimple.h (gimple_expr_type): For PHIs return the type
> >> > >         of the result.
> >> > >         * tree-vect-loop-manip.c 
> >> > > (slpeel_tree_duplicate_loop_to_edge_cfg):
> >> > >         Make sure edge order into copied loop headers line up with the
> >> > >         originals.
> >> > >         * tree-vect-loop.c (vect_transform_cycle_phi): Handle nested
> >> > >         loops with SLP.
> >> > >         (vectorizable_phi): New function.
> >> > >         (vectorizable_live_operation): For BB vectorization compute 
> >> > > insert
> >> > >         location here.
> >> > >         * tree-vect-slp.c (vect_free_slp_tree): Deal with NULL
> >> > >         SLP_TREE_CHILDREN entries.
> >> > >         (vect_create_new_slp_node): Add overloads with pre-existing 
> >> > > node
> >> > >         argument.
> >> > >         (vect_print_slp_graph): Likewise.
> >> > >         (vect_mark_slp_stmts): Likewise.
> >> > >         (vect_mark_slp_stmts_relevant): Likewise.
> >> > >         (vect_gather_slp_loads): Likewise.
> >> > >         (vect_optimize_slp): Likewise.
> >> > >         (vect_slp_analyze_node_operations): Likewise.
> >> > >         (vect_bb_slp_scalar_cost): Likewise.
> >> > >         (vect_remove_slp_scalar_calls): Likewise.
> >> > >         (vect_get_and_check_slp_defs): Handle PHIs.
> >> > >         (vect_build_slp_tree_1): Handle PHIs.
> >> > >         (vect_build_slp_tree_2): Continue SLP build, following PHI
> >> > >         arguments.  Fix memory leak.
> >> > >         (vect_build_slp_tree): Put stub node into the hash-map so
> >> > >         we can discover cycles directly.
> >> > >         (vect_build_slp_instance): Set the backedge SLP def for
> >> > >         reduction chains.
> >> > >         (vect_analyze_slp_backedges): Remove.
> >> > >         (vect_analyze_slp): Do not call it.
> >> > >         (vect_slp_convert_to_external): Release 
> >> > > SLP_TREE_LOAD_PERMUTATION.
> >> > >         (vect_slp_analyze_node_operations): Handle stray failed
> >> > >         backedge defs by failing.
> >> > >         (vect_slp_build_vertices): Adjust leaf condition.
> >> > >         (vect_bb_slp_mark_live_stmts): Handle PHIs, use visited
> >> > >         hash-set to handle cycles.
> >> > >         (vect_slp_analyze_operations): Adjust.
> >> > >         (vect_bb_partition_graph_r): Likewise.
> >> > >         (vect_slp_function): Adjust split condition to allow CFG
> >> > >         merges.
> >> > >         (vect_schedule_slp_instance): Rename to ...
> >> > >         (vect_schedule_slp_node): ... this.  Move DFS walk to ...
> >> > >         (vect_schedule_scc): ... this new function.
> >> > >         (vect_schedule_slp): Call it.  Remove ad-hoc vectorized
> >> > >         backedge fill code.
> >> > >         * tree-vect-stmts.c (vect_analyze_stmt): Call
> >> > >         vectorizable_phi.
> >> > >         (vect_transform_stmt): Likewise.
> >> > >         (vect_is_simple_use): Handle vect_backedge_def.
> >> > >         * tree-vectorizer.c (vec_info::new_stmt_vec_info): Only
> >> > >         set loop header PHIs to vect_unknown_def_type for loop
> >> > >         vectorization.
> >> > >         * tree-vectorizer.h (enum vect_def_type): Add 
> >> > > vect_backedge_def.
> >> > >         (enum stmt_vec_info_type): Add phi_info_type.
> >> > >         (vectorizable_phi): Declare.
> >> > >
> >> > >         * gcc.dg/vect/bb-slp-54.c: New test.
> >> > >         * gcc.dg/vect/bb-slp-55.c: Likewise.
> >> > >         * gcc.dg/vect/bb-slp-56.c: Likewise.
> >> > >         * gcc.dg/vect/bb-slp-57.c: Likewise.
> >> > >         * gcc.dg/vect/bb-slp-58.c: Likewise.
> >> > >         * gcc.dg/vect/bb-slp-59.c: Likewise.
> >> > >         * gcc.dg/vect/bb-slp-60.c: Likewise.
> >> > >         * gcc.dg/vect/bb-slp-61.c: Likewise.
> >> > >         * gcc.dg/vect/bb-slp-62.c: Likewise.
> >> > >         * gcc.dg/vect/bb-slp-63.c: Likewise.
> >> > >         * gcc.dg/vect/bb-slp-64.c: Likewise.
> >> > >         * gcc.dg/vect/bb-slp-65.c: Likewise.
> >> > >         * gcc.dg/vect/bb-slp-66.c: Likewise.
> >> > >         * gcc.dg/vect/vect-outer-slp-1.c: Likewise.
> >> > >         * gfortran.dg/vect/O3-bb-slp-1.f: Likewise.
> >> > >         * gfortran.dg/vect/O3-bb-slp-2.f: Likewise.
> >> > >         * g++.dg/vect/simd-11.cc: Likewise.
> >> >
> >> >
> >> > I've filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97616
> >> > because bb-slp-58.c and bb-slp-59.c fail on arm.
> >> >
> >> 
> >> And it causes a regression on aarch64:
> >> FAIL:    gcc.target/aarch64/sve/reduc_strict_4.c -march=armv8.2-a+sve
> >> scan-assembler-times \\tfadda\\td[0-9]+, p[0-7], d[0-9]+, z[0-9]+\\.d
> >> 4
> >
> > I'm testing a fix for this one.
> >
> >> FAIL:    gcc.target/aarch64/sve/reduc_strict_5.c -march=armv8.2-a+sve
> >> scan-assembler-times \\tfadda\\td[0-9]+, p[0-7], d[0-9]+, z[0-9]+\\.d
> >> 6
> >
> > That one is more odd.  For some reason we arrive at
> >
> > /home/rguenther/src/trunk/gcc/testsuite/gcc.target/aarch64/sve/reduc_strict_5.c:10:21:
> >  
> > note:   vect_is_simple_use: operand 0.0, type of def: constant
> > /home/rguenther/src/trunk/gcc/testsuite/gcc.target/aarch64/sve/reduc_strict_5.c:10:21:
> >  
> > missed:   Build SLP failed: invalid type of def for variable-length SLP 
> > 0.0
> >
> > we previously didn't analyze the initial value (we will later not
> > build a SLP node for it but the defs are still analyzed by
> > vect_get_and_check_slp_defs which I can fix).  Sill I want to
> > know why we cannot create a vector of all zeros.  The check is
> >
> >           if ((dt == vect_constant_def
> >                || dt == vect_external_def)
> >               && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
> >               && (TREE_CODE (type) == BOOLEAN_TYPE
> >                   || !can_duplicate_and_interleave_p (vinfo, stmts.length 
> > (),
> >                                                       type)))
> >             {
> >               if (dump_enabled_p ())
> >                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >                                  "Build SLP failed: invalid type of def "
> >                                  "for variable-length SLP %T\n", oprnd);
> >               return -1;
> >             }
> >
> > so the reason must be the group size (12)?  It succeeds with 8.
> > But then the check looks totally premature at the above point
> > because for sure an all-same constant can be always duplicated?
> >
> > So - what breaks if we simply remove the above?  Richard?
> 
> Well, I think there are two issues:
> 
> - Should all-same constants be OK?  Agree they should, but it would
>   involve extending the check and making a promise that the values
>   won't change later (and so invalidate the analysis).
> 
> - Do we need to check a condition at all?  If we don't,
>   duplicate_and_interleave will ICE when trying to build invalid
>   variable-length constant or external definitions.  So we need
>   to check it somewhere, even if it's not here.

OK, I assumed it was present elsewhere already (for non-SLP vect).

Doing the check at its current place makes sense if the "error"
can be mitigated by swapping [parent] operands or by splitting
the SLP graph at other lanes.  Possibly the latter could help
here, but it's a conflict with a more precise check.  A more
precise check would be where we call vectorizable_* which
I admit doesn't yet exist for constants / invariants which we
simply assume can always generate.

There's similar "premature" checking for vect vs. vect and
vect vs. scalar shift capabilities in vect_build_slp_tree_1
which there helps to eventually shuffle the correct stmts
into better matching operand positions ...

So - it would probably help to "duplicate" the check to
vect_slp_analyze_node_operations (where we call vectorizable_*)
because we do generate externals in other places where we
do _not_ have this kind of check (vect_slp_convert_to_external
and several places in vect_build_slp_tree_2 where we create
the node from scalar pieces).

Richard.

> Thanks,
> Richard
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend

Reply via email to