> -----Original Message-----
> From: Christophe Lyon <christophe.l...@linaro.org>
> Sent: Wednesday, February 24, 2021 2:17 PM
> To: Richard Biener <rguent...@suse.de>
> Cc: Tamar Christina <tamar.christ...@arm.com>; nd <n...@arm.com>; gcc
> Patches <gcc-patches@gcc.gnu.org>
> Subject: Re: [PATCH v2] middle-end slp: fix sharing of SLP only patterns.
> 
> On Wed, 24 Feb 2021 at 09:38, Richard Biener <rguent...@suse.de> wrote:
> >
> > On Tue, 23 Feb 2021, Tamar Christina wrote:
> >
> > > Hi Richi,
> > >
> > > The attached testcase ICEs due to a couple of issues.
> > > In the testcase you have two SLP instances that share the majority
> > > of their definition with each other.  One tree defines a COMPLEX_MUL
> > > sequence and the other tree a COMPLEX_FMA.
> > >
> > > The ice happens because:
> > >
> > > 1. the refcounts are wrong, in particular the FMA case doesn't
> > > correctly count the references for the COMPLEX_MUL that it consumes.
> > >
> > > 2. when the FMA is created it incorrectly assumes it can just tear
> > > apart the MUL node that it's consuming.  This is wrong and should
> > > only be done when there is no more uses of the node, in which case
> > > the vector only pattern is no longer relevant.
> > >
> > > To fix the last part the SLP only pattern reset code was moved into
> > > vect_free_slp_tree which results in cleaner code.  I also think it
> > > does belong there since that function knows when there are no more
> > > uses of the node and so the pattern should be unmarked, so when the
> > > the vectorizer is inspecting the BB it doesn't find the now invalid vector
> only patterns.
> > >
> > > The patch also clears the SLP_TREE_REPRESENTATIVE when stores are
> > > removed such that we don't hit an error later trying to free the
> stmt_vec_info again.
> > >
> > > Lastly it also tweaks the results of whether a pattern has been
> > > detected or not to return true when another SLP instance has created
> > > a pattern that is only used by a different instance (due to the trees 
> > > being
> unshared).
> > >
> > > Instead of ICEing this code now produces
> > >
> > >         adrp    x1, .LANCHOR0
> > >         add     x2, x1, :lo12:.LANCHOR0
> > >         movi    v1.2s, 0
> > >         mov     w0, 0
> > >         ldr     x4, [x1, #:lo12:.LANCHOR0]
> > >         ldrsw   x3, [x2, 16]
> > >         ldr     x1, [x2, 8]
> > >         ldrsw   x2, [x2, 20]
> > >         ldr     d0, [x4]
> > >         ldr     d2, [x1, x3, lsl 3]
> > >         fcmla   v2.2s, v0.2s, v0.2s, #0
> > >         fcmla   v2.2s, v0.2s, v0.2s, #90
> > >         str     d2, [x1, x3, lsl 3]
> > >         fcmla   v1.2s, v0.2s, v0.2s, #0
> > >         fcmla   v1.2s, v0.2s, v0.2s, #90
> > >         str     d1, [x1, x2, lsl 3]
> > >         ret
> > >
> > > PS. This testcase actually shows that the codegen we get in these
> > > cases is not optimal. It should generate a MUL + ADD instead MUL + FMA.
> > >
> > > But that's for GCC 12.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Hi Tamar,
> 
> The new test fails on aarch64_be:
> FAIL: g++.dg/vect/pr99149.cc  -std=c++14  scan-tree-dump-times slp2
> "stmt.*COMPLEX_FMA" 1
> FAIL: g++.dg/vect/pr99149.cc  -std=c++17  scan-tree-dump-times slp2
> "stmt.*COMPLEX_FMA" 1
> FAIL: g++.dg/vect/pr99149.cc  -std=c++2a  scan-tree-dump-times slp2
> "stmt.*COMPLEX_FMA" 1
> FAIL: g++.dg/vect/pr99149.cc  -std=c++98  scan-tree-dump-times slp2
> "stmt.*COMPLEX_FMA" 1
> 
> Is that supposed to work, or should you disable the test on aarch64_be?

Args, it's a bit complicated, it's blocked on AArch64 big-endian but not SVE 
big-endian.

I'll disable all be for now.

Thanks,
Tamar
> 
> Christophe
> 
> >
> > OK.
> >
> > Thanks,
> > Richard.
> >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > >       PR tree-optimization/99149
> > >       * tree-vect-slp-patterns.c (vect_detect_pair_op): Don't recreate the
> > >       buffer.
> > >       (vect_slp_reset_pattern): Remove.
> > >       (complex_fma_pattern::matches): Remove call to
> vect_slp_reset_pattern.
> > >       (complex_mul_pattern::build, complex_fma_pattern::build,
> > >       complex_fms_pattern::build): Fix ref counts.
> > >       * tree-vect-slp.c (vect_free_slp_tree): Undo SLP only pattern
> relevancy
> > >       when node is being deleted.
> > >       (vect_match_slp_patterns_2): Correct result of cache hit on 
> > > patterns.
> > >       (vect_schedule_slp): Invalidate SLP_TREE_REPRESENTATIVE of
> removed
> > >       stores.
> > >       * tree-vectorizer.c (vec_info::new_stmt_vec_info): Initialize value.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >       PR tree-optimization/99149
> > >       * g++.dg/vect/pr99149.cc: New test.
> > >
> > > --- inline copy of patch --
> > > diff --git a/gcc/testsuite/g++.dg/vect/pr99149.cc
> > > b/gcc/testsuite/g++.dg/vect/pr99149.cc
> > > new file mode 100755
> > > index
> > >
> 0000000000000000000000000000000000000000..902a26f576fcc79d2802bec093
> > > 668674cca1c84f
> > > --- /dev/null
> > > +++ b/gcc/testsuite/g++.dg/vect/pr99149.cc
> > > @@ -0,0 +1,28 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-additional-options "-w -O3 -march=armv8.3-a
> > > +-fdump-tree-slp-all" { target { aarch64*-*-* } } } */
> > > +
> > > +class a {
> > > +  float b;
> > > +  float c;
> > > +
> > > +public:
> > > +  a(float d, float e) : b(d), c(e) {}
> > > +  a operator+(a d) { return a(b + d.b, c + d.c); }
> > > +  a operator*(a d) { return a(b * b - c * c, b * c + c * d.b); } };
> > > +int f, g; class {
> > > +  a *h;
> > > +  a *i;
> > > +
> > > +public:
> > > +  void j() {
> > > +    a k = h[0], l = i[g], m = k * i[f];
> > > +    i[g] = l + m;
> > > +    i[f] = m;
> > > +  }
> > > +} n;
> > > +main() { n.j(); }
> > > +
> > > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_MUL" 1 "slp2" }
> > > +} */
> > > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_FMA" 1 "slp2" }
> > > +} */
> > > diff --git a/gcc/tree-vect-slp-patterns.c
> > > b/gcc/tree-vect-slp-patterns.c index
> > >
> f0817da9f622d22e3df2e30410d1cf610b4ffa1d..1e2769662a54229ab8e24390f9
> > > 7dfe206f17ab57 100644
> > > --- a/gcc/tree-vect-slp-patterns.c
> > > +++ b/gcc/tree-vect-slp-patterns.c
> > > @@ -407,9 +407,8 @@ vect_detect_pair_op (slp_tree node1, slp_tree
> > > node2, lane_permutation_t &lanes,
> > >
> > >    if (result != CMPLX_NONE && ops != NULL)
> > >      {
> > > -      ops->create (2);
> > > -      ops->quick_push (node1);
> > > -      ops->quick_push (node2);
> > > +      ops->safe_push (node1);
> > > +      ops->safe_push (node2);
> > >      }
> > >    return result;
> > >  }
> > > @@ -1090,15 +1089,17 @@ complex_mul_pattern::build (vec_info *vinfo)
> > > {
> > >    slp_tree node;
> > >    unsigned i;
> > > +  slp_tree newnode
> > > +    = vect_build_combine_node (this->m_ops[0], this->m_ops[1],
> > > + *this->m_node);  SLP_TREE_REF_COUNT (this->m_ops[2])++;
> > > +
> > >    FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (*this->m_node), i, node)
> > >      vect_free_slp_tree (node);
> > >
> > >    /* First re-arrange the children.  */
> > >    SLP_TREE_CHILDREN (*this->m_node).reserve_exact (2);
> > >    SLP_TREE_CHILDREN (*this->m_node)[0] = this->m_ops[2];
> > > -  SLP_TREE_CHILDREN (*this->m_node)[1] =
> > > -    vect_build_combine_node (this->m_ops[0], this->m_ops[1], *this-
> >m_node);
> > > -  SLP_TREE_REF_COUNT (this->m_ops[2])++;
> > > +  SLP_TREE_CHILDREN (*this->m_node)[1] = newnode;
> > >
> > >    /* And then rewrite the node itself.  */
> > >    complex_pattern::build (vinfo);
> > > @@ -1133,18 +1134,6 @@ class complex_fma_pattern : public
> complex_pattern
> > >      }
> > >  };
> > >
> > > -/* Helper function to "reset" a previously matched node and undo the
> changes
> > > -   made enough so that the node is treated as an irrelevant node.  */
> > > -
> > > -static inline void
> > > -vect_slp_reset_pattern (slp_tree node) -{
> > > -  stmt_vec_info stmt_info = vect_orig_stmt
> (SLP_TREE_REPRESENTATIVE
> > > (node));
> > > -  STMT_VINFO_IN_PATTERN_P (stmt_info) = false;
> > > -  STMT_SLP_TYPE (stmt_info) = pure_slp;
> > > -  SLP_TREE_REPRESENTATIVE (node) = stmt_info; -}
> > > -
> > >  /* Pattern matcher for trying to match complex multiply and accumulate
> > >     and multiply and subtract patterns in SLP tree.
> > >     If the operation matches then IFN is set to the operation it
> > > matched and @@ -1208,15 +1197,6 @@ complex_fma_pattern::matches
> (complex_operation_t op,
> > >    if (!vect_pattern_validate_optab (ifn, vnode))
> > >      return IFN_LAST;
> > >
> > > -  /* FMA matched ADD + CMUL.  During the matching of CMUL the
> > > -     stmt that starts the pattern is marked as being in a pattern,
> > > -     namely the CMUL.  When replacing this with a CFMA we have to
> > > -     unmark this statement as being in a pattern.  This is because
> > > -     vect_mark_pattern_stmts will only mark the current stmt as being
> > > -     in a pattern.  Later on when the scalar stmts are examined the
> > > -     old statement which is supposed to be irrelevant will point to
> > > -     CMUL unless we undo the pattern relationship here.  */
> > > -  vect_slp_reset_pattern (node);
> > >    ops->truncate (0);
> > >    ops->create (3);
> > >
> > > @@ -1259,10 +1239,17 @@ complex_fma_pattern::recognize
> > > (slp_tree_to_load_perm_map_t *perm_cache,  void
> > > complex_fma_pattern::build (vec_info *vinfo)  {
> > > +  slp_tree node = SLP_TREE_CHILDREN (*this->m_node)[1];
> > > +
> > >    SLP_TREE_CHILDREN (*this->m_node).release ();
> > >    SLP_TREE_CHILDREN (*this->m_node).create (3);
> > >    SLP_TREE_CHILDREN (*this->m_node).safe_splice (this->m_ops);
> > >
> > > +  SLP_TREE_REF_COUNT (this->m_ops[1])++;  SLP_TREE_REF_COUNT
> > > + (this->m_ops[2])++;
> > > +
> > > +  vect_free_slp_tree (node);
> > > +
> > >    complex_pattern::build (vinfo);
> > >  }
> > >
> > > @@ -1427,6 +1414,11 @@ complex_fms_pattern::build (vec_info *vinfo)
> > > {
> > >    slp_tree node;
> > >    unsigned i;
> > > +  slp_tree newnode =
> > > +    vect_build_combine_node (this->m_ops[2], this->m_ops[3],
> > > + *this->m_node);  SLP_TREE_REF_COUNT (this->m_ops[0])++;
> > > + SLP_TREE_REF_COUNT (this->m_ops[1])++;
> > > +
> > >    FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (*this->m_node), i, node)
> > >      vect_free_slp_tree (node);
> > >
> > > @@ -1436,10 +1428,7 @@ complex_fms_pattern::build (vec_info *vinfo)
> > >    /* First re-arrange the children.  */
> > >    SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[0]);
> > >    SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[1]);
> > > -  SLP_TREE_CHILDREN (*this->m_node).quick_push (
> > > -    vect_build_combine_node (this->m_ops[2], this->m_ops[3], *this-
> >m_node));
> > > -  SLP_TREE_REF_COUNT (this->m_ops[0])++;
> > > -  SLP_TREE_REF_COUNT (this->m_ops[1])++;
> > > +  SLP_TREE_CHILDREN (*this->m_node).quick_push (newnode);
> > >
> > >    /* And then rewrite the node itself.  */
> > >    complex_pattern::build (vinfo);
> > > diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index
> > >
> ea8a97b01c6371791ac66de3e1dabfedee69cb67..605873714a5cafaaf822f61f1f
> > > 769f96b3876694 100644
> > > --- a/gcc/tree-vect-slp.c
> > > +++ b/gcc/tree-vect-slp.c
> > > @@ -146,6 +146,16 @@ vect_free_slp_tree (slp_tree node)
> > >      if (child)
> > >        vect_free_slp_tree (child);
> > >
> > > +  /* If the node defines any SLP only patterns then those patterns are no
> > > +     longer valid and should be removed.  */  stmt_vec_info
> > > + rep_stmt_info = SLP_TREE_REPRESENTATIVE (node);  if (rep_stmt_info
> > > + && STMT_VINFO_SLP_VECT_ONLY_PATTERN (rep_stmt_info))
> > > +    {
> > > +      stmt_vec_info stmt_info = vect_orig_stmt (rep_stmt_info);
> > > +      STMT_VINFO_IN_PATTERN_P (stmt_info) = false;
> > > +      STMT_SLP_TYPE (stmt_info) = STMT_SLP_TYPE (rep_stmt_info);
> > > +    }
> > > +
> > >    delete node;
> > >  }
> > >
> > > @@ -2395,7 +2405,9 @@ vect_match_slp_patterns_2 (slp_tree
> *ref_node, vec_info *vinfo,
> > >    slp_tree node = *ref_node;
> > >    bool found_p = false;
> > >    if (!node || visited->add (node))
> > > -    return false;
> > > +    return node
> > > +        && SLP_TREE_REPRESENTATIVE (node)
> > > +        && STMT_VINFO_SLP_VECT_ONLY_PATTERN
> > > + (SLP_TREE_REPRESENTATIVE (node));
> > >
> > >    slp_tree child;
> > >    FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) @@ -6451,6
> > > +6463,11 @@ vect_schedule_slp (vec_info *vinfo, vec<slp_instance>
> slp_instances)
> > >         store_info = vect_orig_stmt (store_info);
> > >         /* Free the attached stmt_vec_info and remove the stmt.  */
> > >         vinfo->remove_stmt (store_info);
> > > +
> > > +       /* Invalidate SLP_TREE_REPRESENTATIVE in case we released it
> > > +          to not crash in vect_free_slp_tree later.  */
> > > +       if (SLP_TREE_REPRESENTATIVE (root) == store_info)
> > > +         SLP_TREE_REPRESENTATIVE (root) = NULL;
> > >          }
> > >      }
> > >  }
> > > diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c index
> > >
> 5b45df3a4e00266b7530eb4da6985f0d940cb05b..63ba594f2276850a00fc37207
> 2
> > > d98326891f19e6 100644
> > > --- a/gcc/tree-vectorizer.c
> > > +++ b/gcc/tree-vectorizer.c
> > > @@ -695,6 +695,7 @@ vec_info::new_stmt_vec_info (gimple *stmt)
> > >    STMT_VINFO_REDUC_FN (res) = IFN_LAST;
> > >    STMT_VINFO_REDUC_IDX (res) = -1;
> > >    STMT_VINFO_SLP_VECT_ONLY (res) = false;
> > > +  STMT_VINFO_SLP_VECT_ONLY_PATTERN (res) = false;
> > >    STMT_VINFO_VEC_STMTS (res) = vNULL;
> > >
> > >    if (is_a <loop_vec_info> (this)
> > >
> > >
> > >
> >
> > --
> > Richard Biener <rguent...@suse.de>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> > Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Reply via email to