Hi Richi, This is an updated patch with your suggestion.
Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: PR tree-optimization/99220 * tree-vect-slp.c (optimize_load_redistribution_1): Remove node from cache when it's about to be deleted. gcc/testsuite/ChangeLog: PR tree-optimization/99220 * g++.dg/vect/pr99220.cc: New test. The 02/24/2021 08:52, Richard Biener wrote: > On Tue, 23 Feb 2021, Tamar Christina wrote: > > > Hi Richi, > > > > The attached testcase shows a bug where two nodes end up with the same > > pointer. > > During the loop that analyzes all the instances > > in optimize_load_redistribution_1 we do > > > > if (value) > > { > > SLP_TREE_REF_COUNT (value)++; > > SLP_TREE_CHILDREN (root)[i] = value; > > vect_free_slp_tree (node); > > } > > > > when doing a replacement. When this is done and the refcount for the node > > reaches 0, the node is removed, which allows the libc to return the pointer > > again in the next call to new, which it does.. > > > > First instance > > > > note: node 0x5325f48 (max_nunits=1, refcnt=2) > > note: op: VEC_PERM_EXPR > > note: { } > > note: lane permutation { 0[0] 1[1] 0[2] 1[3] } > > note: children 0x5325db0 0x5325200 > > > > Second instance > > > > note: node 0x5325f48 (max_nunits=1, refcnt=1) > > note: op: VEC_PERM_EXPR > > note: { } > > note: lane permutation { 0[0] 1[1] } > > note: children 0x53255b8 0x5325530 > > > > This will end up with the illegal construction of > > > > note: node 0x53258e8 (max_nunits=2, refcnt=2) > > note: op template: slp_patt_57 = .COMPLEX_MUL (_16, _16); > > note: stmt 0 _16 = _14 - _15; > > note: stmt 1 _23 = _17 + _22; > > note: children 0x53257d8 0x5325d28 > > note: node 0x53257d8 (max_nunits=2, refcnt=3) > > note: op template: l$b_4 = MEM[(const struct a &)_3].b; > > note: stmt 0 l$b_4 = MEM[(const struct a &)_3].b; > > note: stmt 1 l$c_5 = MEM[(const struct a &)_3].c; > > note: load permutation { 0 1 } > > note: node 0x5325d28 (max_nunits=2, refcnt=8) > > note: op template: l$b_4 = MEM[(const struct a &)_3].b; > > note: stmt 0 l$b_4 = MEM[(const struct a &)_3].b; > > note: stmt 1 l$c_5 = MEM[(const struct a &)_3].c; > > note: stmt 2 l$b_4 = MEM[(const struct a &)_3].b; > > note: stmt 3 l$c_5 = MEM[(const struct a &)_3].c; > > note: load permutation { 0 1 0 1 } > > > > To prevent this my initial thought was to add the temporary VEC_PERM_EXPR > > nodes > > to the bst_map cache and increase their refcnt one more. However since > > bst_map > > is gated on scalar statements and these nodes have none we can't do that. > > > > Instead I realized that load_map is really only a visited list at the top > > level. > > So instead of returning the reference, we should return NULL. > > > > What this means is that it will no replacement was found at that level. > > This is > > fine since these VEC_PERM_EXPR are single use. So while the any other node > > is > > an indication to use the cache, VEC_PERM_EXPR are an indication to avoid it. > > I don't understand really. Waiting for the other patch to be pushed so > I can eventually have a look, but see below. > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > > > Ok for master? > > > > Thanks, > > Tamar > > > > gcc/ChangeLog: > > > > PR tree-optimization/99220 > > * tree-vect-slp.c (optimize_load_redistribution_1): Don't use > > VEC_PERM_EXPR in cache. > > > > gcc/testsuite/ChangeLog: > > > > PR tree-optimization/99220 > > * g++.dg/vect/pr99220.cc: New test. > > > > --- inline copy of patch -- > > diff --git a/gcc/testsuite/g++.dg/vect/pr99220.cc > > b/gcc/testsuite/g++.dg/vect/pr99220.cc > > new file mode 100755 > > index > > 0000000000000000000000000000000000000000..ff3058832b742414202a8ada0a9dafc72c9a54aa > > --- /dev/null > > +++ b/gcc/testsuite/g++.dg/vect/pr99220.cc > > @@ -0,0 +1,29 @@ > > +/* { dg-do compile } */ > > +/* { dg-additional-options "-w -O3 -march=armv8.3-a" { target { > > aarch64*-*-* } } } */ > > + > > +class a { > > + float b; > > + float c; > > + > > +public: > > + a(float d, float e) : b(d), c(e) {} > > + a operator+(a d) { return a(b + d.b, c + d.c); } > > + a operator-(a d) { return a(b - d.b, c - d.c); } > > + a operator*(a d) { return a(b * b - c * c, b * c + c * d.b); } > > +}; > > +long f; > > +a *g; > > +class { > > + a *h; > > + long i; > > + a *j; > > + > > +public: > > + void k() { > > + a l = h[0], m = g[i], n = l * g[1], o = l * j[8]; > > + g[i] = m + n; > > + g[i + 1] = m - n; > > + j[f] = o; > > + } > > +} p; > > +main() { p.k(); } > > diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c > > index > > 605873714a5cafaaf822f61f1f769f96b3876694..e631463be8fc5b2de355e674a9c96665beb9516c > > 100644 > > --- a/gcc/tree-vect-slp.c > > +++ b/gcc/tree-vect-slp.c > > @@ -2292,7 +2292,12 @@ optimize_load_redistribution_1 > > (scalar_stmts_to_slp_tree_map_t *bst_map, > > slp_tree root) > > { > > if (slp_tree *leader = load_map->get (root)) > > - return *leader; > > + { > > + if (SLP_TREE_CODE (root) == VEC_PERM_EXPR) > > + return NULL; > > But this will then only optimize the first occurance. Wouldn't it be > better to increase the refcount at > > load_map->put (root, node); > > and walk load_map at the end, releasing refs owned by it like we do > for bst_map? > > > + else > > + return *leader; > > + } > > > > load_map->put (root, NULL); > > > > > > > > > > -- > Richard Biener <rguent...@suse.de> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg) --
diff --git a/gcc/testsuite/g++.dg/vect/pr99220.cc b/gcc/testsuite/g++.dg/vect/pr99220.cc new file mode 100755 index 0000000000000000000000000000000000000000..b41b2d4f0ee401732073321645f3f1f958da842d --- /dev/null +++ b/gcc/testsuite/g++.dg/vect/pr99220.cc @@ -0,0 +1,29 @@ +/* { dg-do compile { target { aarch64*-*-* } } } */ +/* { dg-additional-options "-w -O3 -march=armv8.3-a" } */ + +class a { + float b; + float c; + +public: + a(float d, float e) : b(d), c(e) {} + a operator+(a d) { return a(b + d.b, c + d.c); } + a operator-(a d) { return a(b - d.b, c - d.c); } + a operator*(a d) { return a(b * b - c * c, b * c + c * d.b); } +}; +long f; +a *g; +class { + a *h; + long i; + a *j; + +public: + void k() { + a l = h[0], m = g[i], n = l * g[1], o = l * j[8]; + g[i] = m + n; + g[i + 1] = m - n; + j[f] = o; + } +} p; +main() { p.k(); } diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index 605873714a5cafaaf822f61f1f769f96b3876694..09a0f42410e3542c7823d70757550c22b8ef676f 100644 --- a/gcc/tree-vect-slp.c +++ b/gcc/tree-vect-slp.c @@ -2351,6 +2351,12 @@ next: { SLP_TREE_REF_COUNT (value)++; SLP_TREE_CHILDREN (root)[i] = value; + /* ??? We know the original leafs of the replaced nodes will + be referenced by bst_map, only the permutes created by + pattern matching are not. */ + if (SLP_TREE_REF_COUNT (node) == 1) + load_map->remove (node); + vect_free_slp_tree (node); } } @@ -2383,6 +2389,12 @@ optimize_load_redistribution (scalar_stmts_to_slp_tree_map_t *bst_map, { SLP_TREE_REF_COUNT (value)++; SLP_TREE_CHILDREN (root)[i] = value; + /* ??? We know the original leafs of the replaced nodes will + be referenced by bst_map, only the permutes created by + pattern matching are not. */ + if (SLP_TREE_REF_COUNT (node) == 1) + load_map->remove (node); + vect_free_slp_tree (node); } }