Hi,

On Thu, Oct 14 2021, Richard Biener wrote:
> On Wed, 13 Oct 2021, Martin Jambor wrote:
>
>> Hi,
>> 
>> On Mon, Sep 27 2021, Richard Biener via Gcc-patches wrote:
>> >
>> [...]
>> >
>> > The following is what I have pushed after re-bootstrapping and testing
>> > on x86_64-unknown-linux-gnu.
>> >
>> > Richard.
>> >
>> > From fc335f9fde40d7a20a1a6e38fd6f842ed93a039e Mon Sep 17 00:00:00 2001
>> > From: Richard Biener <rguent...@suse.de>
>> > Date: Wed, 18 Nov 2020 09:36:57 +0100
>> > Subject: [PATCH] Allow different vector types for stmt groups
>> > To: gcc-patches@gcc.gnu.org
>> >
>> > This allows vectorization (in practice non-loop vectorization) to
>> > have a stmt participate in different vector type vectorizations.
>> > It allows us to remove vect_update_shared_vectype and replace it
>> > by pushing/popping STMT_VINFO_VECTYPE from SLP_TREE_VECTYPE around
>> > vect_analyze_stmt and vect_transform_stmt.
>> >
>> > For data-ref the situation is a bit more complicated since we
>> > analyze alignment info with a specific vector type in mind which
>> > doesn't play well when that changes.
>> >
>> > So the bulk of the change is passing down the actual vector type
>> > used for a vectorized access to the various accessors of alignment
>> > info, first and foremost dr_misalignment but also aligned_access_p,
>> > known_alignment_for_access_p, vect_known_alignment_in_bytes and
>> > vect_supportable_dr_alignment.  I took the liberty to replace
>> > ALL_CAPS macro accessors with the lower-case function invocations.
>> >
>> > The actual changes to the behavior are in dr_misalignment which now
>> > is the place factoring in the negative step adjustment as well as
>> > handling alignment queries for a vector type with bigger alignment
>> > requirements than what we can (or have) analyze(d).
>> >
>> > vect_slp_analyze_node_alignment makes use of this and upon receiving
>> > a vector type with a bigger alingment desire re-analyzes the DR
>> > with respect to it but keeps an older more precise result if possible.
>> > In this context it might be possible to do the analysis just once
>> > but instead of analyzing with respect to a specific desired alignment
>> > look for the biggest alignment we can compute a not unknown alignment.
>> >
>> > The ChangeLog includes the functional changes but not the bulk due
>> > to the alignment accessor API changes - I hope that's something good.
>> >
>> > 2021-09-17  Richard Biener  <rguent...@suse.de>
>> >
>> >    PR tree-optimization/97351
>> >    PR tree-optimization/97352
>> >    PR tree-optimization/82426
>> >    * tree-vectorizer.h (dr_misalignment): Add vector type
>> >    argument.
>> >    (aligned_access_p): Likewise.
>> >    (known_alignment_for_access_p): Likewise.
>> >    (vect_supportable_dr_alignment): Likewise.
>> >    (vect_known_alignment_in_bytes): Likewise.  Refactor.
>> >    (DR_MISALIGNMENT): Remove.
>> >    (vect_update_shared_vectype): Likewise.
>> >    * tree-vect-data-refs.c (dr_misalignment): Refactor, handle
>> >    a vector type with larger alignment requirement and apply
>> >    the negative step adjustment here.
>> >    (vect_calculate_target_alignment): Remove.
>> >    (vect_compute_data_ref_alignment): Get explicit vector type
>> >    argument, do not apply a negative step alignment adjustment
>> >    here.
>> >    (vect_slp_analyze_node_alignment): Re-analyze alignment
>> >    when we re-visit the DR with a bigger desired alignment but
>> >    keep more precise results from smaller alignments.
>> >    * tree-vect-slp.c (vect_update_shared_vectype): Remove.
>> >    (vect_slp_analyze_node_operations_1): Do not update the
>> >    shared vector type on stmts.
>> >    * tree-vect-stmts.c (vect_analyze_stmt): Push/pop the
>> >    vector type of an SLP node to the representative stmt-info.
>> >    (vect_transform_stmt): Likewise.
>> 
>> I have bisected an AMD zen2 10% performance regression of SPEC 2006 FP
>> 433.milc bechmark when compiled with -Ofast -march=native -flto to this
>> commit.  See also:
>> 
>>   
>> https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=412.70.0&plot.1=289.70.0&;
>> 
>> I am not sure if a bugzilla bug is in order because I cannot reproduce
>> the regression neither on an AMD zen3 machine nor on Intel CascadeLake,
>
> It's for sure worth a PR for tracking purposes.  But I've not been
> very successful in identifying regression causes on Zen2 - what perf
> points to is usually exactly the same assembly in both base and peak :/

OK, it's PR 102750 then.

Martin


Reply via email to