On Tue, Sep 21, 2021 at 10:55 PM H.J. Lu <hjl.to...@gmail.com> wrote:
>
> On Mon, Sep 20, 2021 at 5:15 AM Richard Biener via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > This allows vectorization (in practice non-loop vectorization) to
> > have a stmt participate in different vector type vectorizations.
> > It allows us to remove vect_update_shared_vectype and replace it
> > by pushing/popping STMT_VINFO_VECTYPE from SLP_TREE_VECTYPE around
> > vect_analyze_stmt and vect_transform_stmt.
> >
> > For data-ref the situation is a bit more complicated since we
> > analyze alignment info with a specific vector type in mind which
> > doesn't play well when that changes.
> >
> > So the bulk of the change is passing down the actual vector type
> > used for a vectorized access to the various accessors of alignment
> > info, first and foremost dr_misalignment but also aligned_access_p,
> > known_alignment_for_access_p, vect_known_alignment_in_bytes and
> > vect_supportable_dr_alignment.  I took the liberty to replace
> > ALL_CAPS macro accessors with the lower-case function invocations.
> >
> > The actual changes to the behavior are in dr_misalignment which now
> > is the place factoring in the negative step adjustment as well as
> > handling alignment queries for a vector type with bigger alignment
> > requirements than what we can (or have) analyze(d).
> >
> > vect_slp_analyze_node_alignment makes use of this and upon receiving
> > a vector type with a bigger alingment desire re-analyzes the DR
> > with respect to it but keeps an older more precise result if possible.
> > In this context it might be possible to do the analysis just once
> > but instead of analyzing with respect to a specific desired alignment
> > look for the biggest alignment we can compute a not unknown alignment.
> >
> > The ChangeLog includes the functional changes but not the bulk due
> > to the alignment accessor API changes - I hope that's something good.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, testing on SPEC
> > CPU 2017 in progress (for stats and correctness).
> >
> > Any comments?
> >
> > Thanks,
> > Richard.
> >
> > 2021-09-17  Richard Biener  <rguent...@suse.de>
> >
> >         PR tree-optimization/97351
> >         PR tree-optimization/97352
> >         PR tree-optimization/82426
> >         * tree-vectorizer.h (dr_misalignment): Add vector type
> >         argument.
> >         (aligned_access_p): Likewise.
> >         (known_alignment_for_access_p): Likewise.
> >         (vect_supportable_dr_alignment): Likewise.
> >         (vect_known_alignment_in_bytes): Likewise.  Refactor.
> >         (DR_MISALIGNMENT): Remove.
> >         (vect_update_shared_vectype): Likewise.
> >         * tree-vect-data-refs.c (dr_misalignment): Refactor, handle
> >         a vector type with larger alignment requirement and apply
> >         the negative step adjustment here.
> >         (vect_calculate_target_alignment): Remove.
> >         (vect_compute_data_ref_alignment): Get explicit vector type
> >         argument, do not apply a negative step alignment adjustment
> >         here.
> >         (vect_slp_analyze_node_alignment): Re-analyze alignment
> >         when we re-visit the DR with a bigger desired alignment but
> >         keep more precise results from smaller alignments.
> >         * tree-vect-slp.c (vect_update_shared_vectype): Remove.
> >         (vect_slp_analyze_node_operations_1): Do not update the
> >         shared vector type on stmts.
> >         * tree-vect-stmts.c (vect_analyze_stmt): Push/pop the
> >         vector type of an SLP node to the representative stmt-info.
> >         (vect_transform_stmt): Likewise.
> >
> >         * gcc.target/i386/vect-pr82426.c: New testcase.
> >         * gcc.target/i386/vect-pr97352.c: Likewise.
> > ---
> >  gcc/testsuite/gcc.target/i386/vect-pr82426.c |  32 +++
> >  gcc/testsuite/gcc.target/i386/vect-pr97352.c |  22 ++
> >  gcc/tree-vect-data-refs.c                    | 217 ++++++++++---------
> >  gcc/tree-vect-slp.c                          |  59 -----
> >  gcc/tree-vect-stmts.c                        |  77 ++++---
> >  gcc/tree-vectorizer.h                        |  32 ++-
> >  6 files changed, 231 insertions(+), 208 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/vect-pr82426.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/vect-pr97352.c
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/vect-pr82426.c 
> > b/gcc/testsuite/gcc.target/i386/vect-pr82426.c
> > new file mode 100644
> > index 00000000000..741a1d14d36
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/vect-pr82426.c
> > @@ -0,0 +1,32 @@
> > +/* i?86 does not have V2SF, x32 does though.  */
> > +/* { dg-do compile { target { lp64 || x32 } } } */
>
> It should be target { ! ia32 }
>
> > +/* ???  With AVX512 we only realize one FMA opportunity.  */
>
> Hongtao, is AVX512 missing 64-bit vector support??
>
(define_insn "fmav2sf4"
  [(set (match_operand:V2SF 0 "register_operand" "=v,v,x")
(fma:V2SF
  (match_operand:V2SF 1 "register_operand" "%0,v,x")
  (match_operand:V2SF 2 "register_operand" "v,v,x")
  (match_operand:V2SF 3 "register_operand" "v,0,x")))]
  "(TARGET_FMA || TARGET_FMA4) && TARGET_MMX_WITH_SSE"
Need to add TARGET_AVX512VL to the condition.
I'll post a patch for this.
> > +/* { dg-options "-O3 -mavx -mfma -mno-avx512f" } */
> > +
> > +struct Matrix
> > +{
> > +  float m11;
> > +  float m12;
> > +  float m21;
> > +  float m22;
> > +  float dx;
> > +  float dy;
> > +};
> > +
> > +struct Matrix multiply(const struct Matrix *a, const struct Matrix *b)
> > +{
> > +  struct Matrix out;
> > +  out.m11 = a->m11*b->m11 + a->m12*b->m21;
> > +  out.m12 = a->m11*b->m12 + a->m12*b->m22;
> > +  out.m21 = a->m21*b->m11 + a->m22*b->m21;
> > +  out.m22 = a->m21*b->m12 + a->m22*b->m22;
> > +
> > +  out.dx = a->dx*b->m11  + a->dy*b->m21 + b->dx;
> > +  out.dy = a->dx*b->m12  + a->dy*b->m22 + b->dy;
> > +  return out;
> > +}
> > +
> > +/* The whole kernel should be vectorized with V4SF and V2SF operations.  */
> > +/* { dg-final { scan-assembler-times "vadd" 1 } } */
> > +/* { dg-final { scan-assembler-times "vmul" 2 } } */
> > +/* { dg-final { scan-assembler-times "vfma" 2 } } */
>
> --
> H.J.



-- 
BR,
Hongtao

Reply via email to