On Wed, 13 Oct 2021, Martin Jambor wrote:

> Hi,
> 
> On Mon, Sep 27 2021, Richard Biener via Gcc-patches wrote:
> >
> [...]
> >
> > The following is what I have pushed after re-bootstrapping and testing
> > on x86_64-unknown-linux-gnu.
> >
> > Richard.
> >
> > From fc335f9fde40d7a20a1a6e38fd6f842ed93a039e Mon Sep 17 00:00:00 2001
> > From: Richard Biener <rguent...@suse.de>
> > Date: Wed, 18 Nov 2020 09:36:57 +0100
> > Subject: [PATCH] Allow different vector types for stmt groups
> > To: gcc-patches@gcc.gnu.org
> >
> > This allows vectorization (in practice non-loop vectorization) to
> > have a stmt participate in different vector type vectorizations.
> > It allows us to remove vect_update_shared_vectype and replace it
> > by pushing/popping STMT_VINFO_VECTYPE from SLP_TREE_VECTYPE around
> > vect_analyze_stmt and vect_transform_stmt.
> >
> > For data-ref the situation is a bit more complicated since we
> > analyze alignment info with a specific vector type in mind which
> > doesn't play well when that changes.
> >
> > So the bulk of the change is passing down the actual vector type
> > used for a vectorized access to the various accessors of alignment
> > info, first and foremost dr_misalignment but also aligned_access_p,
> > known_alignment_for_access_p, vect_known_alignment_in_bytes and
> > vect_supportable_dr_alignment.  I took the liberty to replace
> > ALL_CAPS macro accessors with the lower-case function invocations.
> >
> > The actual changes to the behavior are in dr_misalignment which now
> > is the place factoring in the negative step adjustment as well as
> > handling alignment queries for a vector type with bigger alignment
> > requirements than what we can (or have) analyze(d).
> >
> > vect_slp_analyze_node_alignment makes use of this and upon receiving
> > a vector type with a bigger alingment desire re-analyzes the DR
> > with respect to it but keeps an older more precise result if possible.
> > In this context it might be possible to do the analysis just once
> > but instead of analyzing with respect to a specific desired alignment
> > look for the biggest alignment we can compute a not unknown alignment.
> >
> > The ChangeLog includes the functional changes but not the bulk due
> > to the alignment accessor API changes - I hope that's something good.
> >
> > 2021-09-17  Richard Biener  <rguent...@suse.de>
> >
> >     PR tree-optimization/97351
> >     PR tree-optimization/97352
> >     PR tree-optimization/82426
> >     * tree-vectorizer.h (dr_misalignment): Add vector type
> >     argument.
> >     (aligned_access_p): Likewise.
> >     (known_alignment_for_access_p): Likewise.
> >     (vect_supportable_dr_alignment): Likewise.
> >     (vect_known_alignment_in_bytes): Likewise.  Refactor.
> >     (DR_MISALIGNMENT): Remove.
> >     (vect_update_shared_vectype): Likewise.
> >     * tree-vect-data-refs.c (dr_misalignment): Refactor, handle
> >     a vector type with larger alignment requirement and apply
> >     the negative step adjustment here.
> >     (vect_calculate_target_alignment): Remove.
> >     (vect_compute_data_ref_alignment): Get explicit vector type
> >     argument, do not apply a negative step alignment adjustment
> >     here.
> >     (vect_slp_analyze_node_alignment): Re-analyze alignment
> >     when we re-visit the DR with a bigger desired alignment but
> >     keep more precise results from smaller alignments.
> >     * tree-vect-slp.c (vect_update_shared_vectype): Remove.
> >     (vect_slp_analyze_node_operations_1): Do not update the
> >     shared vector type on stmts.
> >     * tree-vect-stmts.c (vect_analyze_stmt): Push/pop the
> >     vector type of an SLP node to the representative stmt-info.
> >     (vect_transform_stmt): Likewise.
> 
> I have bisected an AMD zen2 10% performance regression of SPEC 2006 FP
> 433.milc bechmark when compiled with -Ofast -march=native -flto to this
> commit.  See also:
> 
>   
> https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=412.70.0&plot.1=289.70.0&;
> 
> I am not sure if a bugzilla bug is in order because I cannot reproduce
> the regression neither on an AMD zen3 machine nor on Intel CascadeLake,

It's for sure worth a PR for tracking purposes.  But I've not been
very successful in identifying regression causes on Zen2 - what perf
points to is usually exactly the same assembly in both base and peak :/

Richard.

> because the history of the benchmark performance and because I know milc
> can be sensitive to conditions outside our control.  And the list of
> dependencies of PR 26163 is long enough as it is.  OTOH, the regression
> reproduces reliably for me.
> 
> Some relevant perf data:
> 
> BEFORE:
> # Samples: 585K of event 'cycles:u'
> # Event count (approx.): 472738682838
> #
> # Overhead       Samples  Command          Shared Object           Symbol
> # ........  ............  ...............  ......................  
> .........................................
> # 
>     24.59%        140397  milc_peak.mine-  milc_peak.mine-lto-nat  [.] 
> u_shift_fermion
>     18.47%        105497  milc_peak.mine-  milc_peak.mine-lto-nat  [.] 
> add_force_to_mom
>     15.97%         96343  milc_peak.mine-  milc_peak.mine-lto-nat  [.] 
> mult_su3_na
>     15.29%         90027  milc_peak.mine-  milc_peak.mine-lto-nat  [.] 
> mult_su3_nn
>      5.55%         35114  milc_peak.mine-  milc_peak.mine-lto-nat  [.] 
> path_product
>      4.75%         27693  milc_peak.mine-  milc_peak.mine-lto-nat  [.] 
> compute_gen_staple
>      2.76%         16109  milc_peak.mine-  milc_peak.mine-lto-nat  [.] 
> mult_su3_an
>      2.42%         14255  milc_peak.mine-  milc_peak.mine-lto-nat  [.] 
> imp_gauge_force.constprop.0
>      2.02%         11561  milc_peak.mine-  milc_peak.mine-lto-nat  [.] 
> mult_adj_su3_mat_4vec
> 
> AFTER:
> # Samples: 634K of event 'cycles:u'
> # Event count (approx.): 513635733685
> #
> # Overhead       Samples  Command          Shared Object           Symbol     
>                               
> # ........  ............  ...............  ......................  
> .........................................
> #
>     24.04%        149010  milc_peak.mine-  milc_peak.mine-lto-nat  [.] 
> add_force_to_mom
>     23.76%        147370  milc_peak.mine-  milc_peak.mine-lto-nat  [.] 
> u_shift_fermion
>     14.19%         90929  milc_peak.mine-  milc_peak.mine-lto-nat  [.] 
> mult_su3_nn
>     14.14%         92912  milc_peak.mine-  milc_peak.mine-lto-nat  [.] 
> mult_su3_na
>      4.90%         33846  milc_peak.mine-  milc_peak.mine-lto-nat  [.] 
> path_product
>      3.89%         24621  milc_peak.mine-  milc_peak.mine-lto-nat  [.] 
> mult_su3_an
>      3.62%         22831  milc_peak.mine-  milc_peak.mine-lto-nat  [.] 
> compute_gen_staple
>      2.05%         13215  milc_peak.mine-  milc_peak.mine-lto-nat  [.] 
> imp_gauge_force.constprop.0
> 
> 
> Martin
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Reply via email to