On Wed, 13 Oct 2021, Martin Jambor wrote: > Hi, > > On Mon, Sep 27 2021, Richard Biener via Gcc-patches wrote: > > > [...] > > > > The following is what I have pushed after re-bootstrapping and testing > > on x86_64-unknown-linux-gnu. > > > > Richard. > > > > From fc335f9fde40d7a20a1a6e38fd6f842ed93a039e Mon Sep 17 00:00:00 2001 > > From: Richard Biener <rguent...@suse.de> > > Date: Wed, 18 Nov 2020 09:36:57 +0100 > > Subject: [PATCH] Allow different vector types for stmt groups > > To: gcc-patches@gcc.gnu.org > > > > This allows vectorization (in practice non-loop vectorization) to > > have a stmt participate in different vector type vectorizations. > > It allows us to remove vect_update_shared_vectype and replace it > > by pushing/popping STMT_VINFO_VECTYPE from SLP_TREE_VECTYPE around > > vect_analyze_stmt and vect_transform_stmt. > > > > For data-ref the situation is a bit more complicated since we > > analyze alignment info with a specific vector type in mind which > > doesn't play well when that changes. > > > > So the bulk of the change is passing down the actual vector type > > used for a vectorized access to the various accessors of alignment > > info, first and foremost dr_misalignment but also aligned_access_p, > > known_alignment_for_access_p, vect_known_alignment_in_bytes and > > vect_supportable_dr_alignment. I took the liberty to replace > > ALL_CAPS macro accessors with the lower-case function invocations. > > > > The actual changes to the behavior are in dr_misalignment which now > > is the place factoring in the negative step adjustment as well as > > handling alignment queries for a vector type with bigger alignment > > requirements than what we can (or have) analyze(d). > > > > vect_slp_analyze_node_alignment makes use of this and upon receiving > > a vector type with a bigger alingment desire re-analyzes the DR > > with respect to it but keeps an older more precise result if possible. > > In this context it might be possible to do the analysis just once > > but instead of analyzing with respect to a specific desired alignment > > look for the biggest alignment we can compute a not unknown alignment. > > > > The ChangeLog includes the functional changes but not the bulk due > > to the alignment accessor API changes - I hope that's something good. > > > > 2021-09-17 Richard Biener <rguent...@suse.de> > > > > PR tree-optimization/97351 > > PR tree-optimization/97352 > > PR tree-optimization/82426 > > * tree-vectorizer.h (dr_misalignment): Add vector type > > argument. > > (aligned_access_p): Likewise. > > (known_alignment_for_access_p): Likewise. > > (vect_supportable_dr_alignment): Likewise. > > (vect_known_alignment_in_bytes): Likewise. Refactor. > > (DR_MISALIGNMENT): Remove. > > (vect_update_shared_vectype): Likewise. > > * tree-vect-data-refs.c (dr_misalignment): Refactor, handle > > a vector type with larger alignment requirement and apply > > the negative step adjustment here. > > (vect_calculate_target_alignment): Remove. > > (vect_compute_data_ref_alignment): Get explicit vector type > > argument, do not apply a negative step alignment adjustment > > here. > > (vect_slp_analyze_node_alignment): Re-analyze alignment > > when we re-visit the DR with a bigger desired alignment but > > keep more precise results from smaller alignments. > > * tree-vect-slp.c (vect_update_shared_vectype): Remove. > > (vect_slp_analyze_node_operations_1): Do not update the > > shared vector type on stmts. > > * tree-vect-stmts.c (vect_analyze_stmt): Push/pop the > > vector type of an SLP node to the representative stmt-info. > > (vect_transform_stmt): Likewise. > > I have bisected an AMD zen2 10% performance regression of SPEC 2006 FP > 433.milc bechmark when compiled with -Ofast -march=native -flto to this > commit. See also: > > > https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=412.70.0&plot.1=289.70.0& > > I am not sure if a bugzilla bug is in order because I cannot reproduce > the regression neither on an AMD zen3 machine nor on Intel CascadeLake,
It's for sure worth a PR for tracking purposes. But I've not been very successful in identifying regression causes on Zen2 - what perf points to is usually exactly the same assembly in both base and peak :/ Richard. > because the history of the benchmark performance and because I know milc > can be sensitive to conditions outside our control. And the list of > dependencies of PR 26163 is long enough as it is. OTOH, the regression > reproduces reliably for me. > > Some relevant perf data: > > BEFORE: > # Samples: 585K of event 'cycles:u' > # Event count (approx.): 472738682838 > # > # Overhead Samples Command Shared Object Symbol > # ........ ............ ............... ...................... > ......................................... > # > 24.59% 140397 milc_peak.mine- milc_peak.mine-lto-nat [.] > u_shift_fermion > 18.47% 105497 milc_peak.mine- milc_peak.mine-lto-nat [.] > add_force_to_mom > 15.97% 96343 milc_peak.mine- milc_peak.mine-lto-nat [.] > mult_su3_na > 15.29% 90027 milc_peak.mine- milc_peak.mine-lto-nat [.] > mult_su3_nn > 5.55% 35114 milc_peak.mine- milc_peak.mine-lto-nat [.] > path_product > 4.75% 27693 milc_peak.mine- milc_peak.mine-lto-nat [.] > compute_gen_staple > 2.76% 16109 milc_peak.mine- milc_peak.mine-lto-nat [.] > mult_su3_an > 2.42% 14255 milc_peak.mine- milc_peak.mine-lto-nat [.] > imp_gauge_force.constprop.0 > 2.02% 11561 milc_peak.mine- milc_peak.mine-lto-nat [.] > mult_adj_su3_mat_4vec > > AFTER: > # Samples: 634K of event 'cycles:u' > # Event count (approx.): 513635733685 > # > # Overhead Samples Command Shared Object Symbol > > # ........ ............ ............... ...................... > ......................................... > # > 24.04% 149010 milc_peak.mine- milc_peak.mine-lto-nat [.] > add_force_to_mom > 23.76% 147370 milc_peak.mine- milc_peak.mine-lto-nat [.] > u_shift_fermion > 14.19% 90929 milc_peak.mine- milc_peak.mine-lto-nat [.] > mult_su3_nn > 14.14% 92912 milc_peak.mine- milc_peak.mine-lto-nat [.] > mult_su3_na > 4.90% 33846 milc_peak.mine- milc_peak.mine-lto-nat [.] > path_product > 3.89% 24621 milc_peak.mine- milc_peak.mine-lto-nat [.] > mult_su3_an > 3.62% 22831 milc_peak.mine- milc_peak.mine-lto-nat [.] > compute_gen_staple > 2.05% 13215 milc_peak.mine- milc_peak.mine-lto-nat [.] > imp_gauge_force.constprop.0 > > > Martin > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)