On Tue, Sep 21, 2021 at 10:55 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > On Mon, Sep 20, 2021 at 5:15 AM Richard Biener via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > > > > This allows vectorization (in practice non-loop vectorization) to > > have a stmt participate in different vector type vectorizations. > > It allows us to remove vect_update_shared_vectype and replace it > > by pushing/popping STMT_VINFO_VECTYPE from SLP_TREE_VECTYPE around > > vect_analyze_stmt and vect_transform_stmt. > > > > For data-ref the situation is a bit more complicated since we > > analyze alignment info with a specific vector type in mind which > > doesn't play well when that changes. > > > > So the bulk of the change is passing down the actual vector type > > used for a vectorized access to the various accessors of alignment > > info, first and foremost dr_misalignment but also aligned_access_p, > > known_alignment_for_access_p, vect_known_alignment_in_bytes and > > vect_supportable_dr_alignment. I took the liberty to replace > > ALL_CAPS macro accessors with the lower-case function invocations. > > > > The actual changes to the behavior are in dr_misalignment which now > > is the place factoring in the negative step adjustment as well as > > handling alignment queries for a vector type with bigger alignment > > requirements than what we can (or have) analyze(d). > > > > vect_slp_analyze_node_alignment makes use of this and upon receiving > > a vector type with a bigger alingment desire re-analyzes the DR > > with respect to it but keeps an older more precise result if possible. > > In this context it might be possible to do the analysis just once > > but instead of analyzing with respect to a specific desired alignment > > look for the biggest alignment we can compute a not unknown alignment. > > > > The ChangeLog includes the functional changes but not the bulk due > > to the alignment accessor API changes - I hope that's something good. > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, testing on SPEC > > CPU 2017 in progress (for stats and correctness). > > > > Any comments? > > > > Thanks, > > Richard. > > > > 2021-09-17 Richard Biener <rguent...@suse.de> > > > > PR tree-optimization/97351 > > PR tree-optimization/97352 > > PR tree-optimization/82426 > > * tree-vectorizer.h (dr_misalignment): Add vector type > > argument. > > (aligned_access_p): Likewise. > > (known_alignment_for_access_p): Likewise. > > (vect_supportable_dr_alignment): Likewise. > > (vect_known_alignment_in_bytes): Likewise. Refactor. > > (DR_MISALIGNMENT): Remove. > > (vect_update_shared_vectype): Likewise. > > * tree-vect-data-refs.c (dr_misalignment): Refactor, handle > > a vector type with larger alignment requirement and apply > > the negative step adjustment here. > > (vect_calculate_target_alignment): Remove. > > (vect_compute_data_ref_alignment): Get explicit vector type > > argument, do not apply a negative step alignment adjustment > > here. > > (vect_slp_analyze_node_alignment): Re-analyze alignment > > when we re-visit the DR with a bigger desired alignment but > > keep more precise results from smaller alignments. > > * tree-vect-slp.c (vect_update_shared_vectype): Remove. > > (vect_slp_analyze_node_operations_1): Do not update the > > shared vector type on stmts. > > * tree-vect-stmts.c (vect_analyze_stmt): Push/pop the > > vector type of an SLP node to the representative stmt-info. > > (vect_transform_stmt): Likewise. > > > > * gcc.target/i386/vect-pr82426.c: New testcase. > > * gcc.target/i386/vect-pr97352.c: Likewise. > > --- > > gcc/testsuite/gcc.target/i386/vect-pr82426.c | 32 +++ > > gcc/testsuite/gcc.target/i386/vect-pr97352.c | 22 ++ > > gcc/tree-vect-data-refs.c | 217 ++++++++++--------- > > gcc/tree-vect-slp.c | 59 ----- > > gcc/tree-vect-stmts.c | 77 ++++--- > > gcc/tree-vectorizer.h | 32 ++- > > 6 files changed, 231 insertions(+), 208 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/i386/vect-pr82426.c > > create mode 100644 gcc/testsuite/gcc.target/i386/vect-pr97352.c > > > > diff --git a/gcc/testsuite/gcc.target/i386/vect-pr82426.c > > b/gcc/testsuite/gcc.target/i386/vect-pr82426.c > > new file mode 100644 > > index 00000000000..741a1d14d36 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/i386/vect-pr82426.c > > @@ -0,0 +1,32 @@ > > +/* i?86 does not have V2SF, x32 does though. */ > > +/* { dg-do compile { target { lp64 || x32 } } } */ > > It should be target { ! ia32 } > > > +/* ??? With AVX512 we only realize one FMA opportunity. */ > > Hongtao, is AVX512 missing 64-bit vector support?? > (define_insn "fmav2sf4" [(set (match_operand:V2SF 0 "register_operand" "=v,v,x") (fma:V2SF (match_operand:V2SF 1 "register_operand" "%0,v,x") (match_operand:V2SF 2 "register_operand" "v,v,x") (match_operand:V2SF 3 "register_operand" "v,0,x")))] "(TARGET_FMA || TARGET_FMA4) && TARGET_MMX_WITH_SSE" Need to add TARGET_AVX512VL to the condition. I'll post a patch for this. > > +/* { dg-options "-O3 -mavx -mfma -mno-avx512f" } */ > > + > > +struct Matrix > > +{ > > + float m11; > > + float m12; > > + float m21; > > + float m22; > > + float dx; > > + float dy; > > +}; > > + > > +struct Matrix multiply(const struct Matrix *a, const struct Matrix *b) > > +{ > > + struct Matrix out; > > + out.m11 = a->m11*b->m11 + a->m12*b->m21; > > + out.m12 = a->m11*b->m12 + a->m12*b->m22; > > + out.m21 = a->m21*b->m11 + a->m22*b->m21; > > + out.m22 = a->m21*b->m12 + a->m22*b->m22; > > + > > + out.dx = a->dx*b->m11 + a->dy*b->m21 + b->dx; > > + out.dy = a->dx*b->m12 + a->dy*b->m22 + b->dy; > > + return out; > > +} > > + > > +/* The whole kernel should be vectorized with V4SF and V2SF operations. */ > > +/* { dg-final { scan-assembler-times "vadd" 1 } } */ > > +/* { dg-final { scan-assembler-times "vmul" 2 } } */ > > +/* { dg-final { scan-assembler-times "vfma" 2 } } */ > > -- > H.J.
-- BR, Hongtao