Hi, On Thu, Oct 14 2021, Richard Biener wrote: > On Wed, 13 Oct 2021, Martin Jambor wrote: > >> Hi, >> >> On Mon, Sep 27 2021, Richard Biener via Gcc-patches wrote: >> > >> [...] >> > >> > The following is what I have pushed after re-bootstrapping and testing >> > on x86_64-unknown-linux-gnu. >> > >> > Richard. >> > >> > From fc335f9fde40d7a20a1a6e38fd6f842ed93a039e Mon Sep 17 00:00:00 2001 >> > From: Richard Biener <rguent...@suse.de> >> > Date: Wed, 18 Nov 2020 09:36:57 +0100 >> > Subject: [PATCH] Allow different vector types for stmt groups >> > To: gcc-patches@gcc.gnu.org >> > >> > This allows vectorization (in practice non-loop vectorization) to >> > have a stmt participate in different vector type vectorizations. >> > It allows us to remove vect_update_shared_vectype and replace it >> > by pushing/popping STMT_VINFO_VECTYPE from SLP_TREE_VECTYPE around >> > vect_analyze_stmt and vect_transform_stmt. >> > >> > For data-ref the situation is a bit more complicated since we >> > analyze alignment info with a specific vector type in mind which >> > doesn't play well when that changes. >> > >> > So the bulk of the change is passing down the actual vector type >> > used for a vectorized access to the various accessors of alignment >> > info, first and foremost dr_misalignment but also aligned_access_p, >> > known_alignment_for_access_p, vect_known_alignment_in_bytes and >> > vect_supportable_dr_alignment. I took the liberty to replace >> > ALL_CAPS macro accessors with the lower-case function invocations. >> > >> > The actual changes to the behavior are in dr_misalignment which now >> > is the place factoring in the negative step adjustment as well as >> > handling alignment queries for a vector type with bigger alignment >> > requirements than what we can (or have) analyze(d). >> > >> > vect_slp_analyze_node_alignment makes use of this and upon receiving >> > a vector type with a bigger alingment desire re-analyzes the DR >> > with respect to it but keeps an older more precise result if possible. >> > In this context it might be possible to do the analysis just once >> > but instead of analyzing with respect to a specific desired alignment >> > look for the biggest alignment we can compute a not unknown alignment. >> > >> > The ChangeLog includes the functional changes but not the bulk due >> > to the alignment accessor API changes - I hope that's something good. >> > >> > 2021-09-17 Richard Biener <rguent...@suse.de> >> > >> > PR tree-optimization/97351 >> > PR tree-optimization/97352 >> > PR tree-optimization/82426 >> > * tree-vectorizer.h (dr_misalignment): Add vector type >> > argument. >> > (aligned_access_p): Likewise. >> > (known_alignment_for_access_p): Likewise. >> > (vect_supportable_dr_alignment): Likewise. >> > (vect_known_alignment_in_bytes): Likewise. Refactor. >> > (DR_MISALIGNMENT): Remove. >> > (vect_update_shared_vectype): Likewise. >> > * tree-vect-data-refs.c (dr_misalignment): Refactor, handle >> > a vector type with larger alignment requirement and apply >> > the negative step adjustment here. >> > (vect_calculate_target_alignment): Remove. >> > (vect_compute_data_ref_alignment): Get explicit vector type >> > argument, do not apply a negative step alignment adjustment >> > here. >> > (vect_slp_analyze_node_alignment): Re-analyze alignment >> > when we re-visit the DR with a bigger desired alignment but >> > keep more precise results from smaller alignments. >> > * tree-vect-slp.c (vect_update_shared_vectype): Remove. >> > (vect_slp_analyze_node_operations_1): Do not update the >> > shared vector type on stmts. >> > * tree-vect-stmts.c (vect_analyze_stmt): Push/pop the >> > vector type of an SLP node to the representative stmt-info. >> > (vect_transform_stmt): Likewise. >> >> I have bisected an AMD zen2 10% performance regression of SPEC 2006 FP >> 433.milc bechmark when compiled with -Ofast -march=native -flto to this >> commit. See also: >> >> >> https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=412.70.0&plot.1=289.70.0& >> >> I am not sure if a bugzilla bug is in order because I cannot reproduce >> the regression neither on an AMD zen3 machine nor on Intel CascadeLake, > > It's for sure worth a PR for tracking purposes. But I've not been > very successful in identifying regression causes on Zen2 - what perf > points to is usually exactly the same assembly in both base and peak :/
OK, it's PR 102750 then. Martin