On Tue, 23 Feb 2021, Tamar Christina wrote: > Hi Richi, > > The attached testcase shows a bug where two nodes end up with the same > pointer. > During the loop that analyzes all the instances > in optimize_load_redistribution_1 we do > > if (value) > { > SLP_TREE_REF_COUNT (value)++; > SLP_TREE_CHILDREN (root)[i] = value; > vect_free_slp_tree (node); > } > > when doing a replacement. When this is done and the refcount for the node > reaches 0, the node is removed, which allows the libc to return the pointer > again in the next call to new, which it does.. > > First instance > > note: node 0x5325f48 (max_nunits=1, refcnt=2) > note: op: VEC_PERM_EXPR > note: { } > note: lane permutation { 0[0] 1[1] 0[2] 1[3] } > note: children 0x5325db0 0x5325200 > > Second instance > > note: node 0x5325f48 (max_nunits=1, refcnt=1) > note: op: VEC_PERM_EXPR > note: { } > note: lane permutation { 0[0] 1[1] } > note: children 0x53255b8 0x5325530 > > This will end up with the illegal construction of > > note: node 0x53258e8 (max_nunits=2, refcnt=2) > note: op template: slp_patt_57 = .COMPLEX_MUL (_16, _16); > note: stmt 0 _16 = _14 - _15; > note: stmt 1 _23 = _17 + _22; > note: children 0x53257d8 0x5325d28 > note: node 0x53257d8 (max_nunits=2, refcnt=3) > note: op template: l$b_4 = MEM[(const struct a &)_3].b; > note: stmt 0 l$b_4 = MEM[(const struct a &)_3].b; > note: stmt 1 l$c_5 = MEM[(const struct a &)_3].c; > note: load permutation { 0 1 } > note: node 0x5325d28 (max_nunits=2, refcnt=8) > note: op template: l$b_4 = MEM[(const struct a &)_3].b; > note: stmt 0 l$b_4 = MEM[(const struct a &)_3].b; > note: stmt 1 l$c_5 = MEM[(const struct a &)_3].c; > note: stmt 2 l$b_4 = MEM[(const struct a &)_3].b; > note: stmt 3 l$c_5 = MEM[(const struct a &)_3].c; > note: load permutation { 0 1 0 1 } > > To prevent this my initial thought was to add the temporary VEC_PERM_EXPR > nodes > to the bst_map cache and increase their refcnt one more. However since > bst_map > is gated on scalar statements and these nodes have none we can't do that. > > Instead I realized that load_map is really only a visited list at the top > level. > So instead of returning the reference, we should return NULL. > > What this means is that it will no replacement was found at that level. This > is > fine since these VEC_PERM_EXPR are single use. So while the any other node is > an indication to use the cache, VEC_PERM_EXPR are an indication to avoid it.
I don't understand really. Waiting for the other patch to be pushed so I can eventually have a look, but see below. > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > PR tree-optimization/99220 > * tree-vect-slp.c (optimize_load_redistribution_1): Don't use > VEC_PERM_EXPR in cache. > > gcc/testsuite/ChangeLog: > > PR tree-optimization/99220 > * g++.dg/vect/pr99220.cc: New test. > > --- inline copy of patch -- > diff --git a/gcc/testsuite/g++.dg/vect/pr99220.cc > b/gcc/testsuite/g++.dg/vect/pr99220.cc > new file mode 100755 > index > 0000000000000000000000000000000000000000..ff3058832b742414202a8ada0a9dafc72c9a54aa > --- /dev/null > +++ b/gcc/testsuite/g++.dg/vect/pr99220.cc > @@ -0,0 +1,29 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-w -O3 -march=armv8.3-a" { target { aarch64*-*-* > } } } */ > + > +class a { > + float b; > + float c; > + > +public: > + a(float d, float e) : b(d), c(e) {} > + a operator+(a d) { return a(b + d.b, c + d.c); } > + a operator-(a d) { return a(b - d.b, c - d.c); } > + a operator*(a d) { return a(b * b - c * c, b * c + c * d.b); } > +}; > +long f; > +a *g; > +class { > + a *h; > + long i; > + a *j; > + > +public: > + void k() { > + a l = h[0], m = g[i], n = l * g[1], o = l * j[8]; > + g[i] = m + n; > + g[i + 1] = m - n; > + j[f] = o; > + } > +} p; > +main() { p.k(); } > diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c > index > 605873714a5cafaaf822f61f1f769f96b3876694..e631463be8fc5b2de355e674a9c96665beb9516c > 100644 > --- a/gcc/tree-vect-slp.c > +++ b/gcc/tree-vect-slp.c > @@ -2292,7 +2292,12 @@ optimize_load_redistribution_1 > (scalar_stmts_to_slp_tree_map_t *bst_map, > slp_tree root) > { > if (slp_tree *leader = load_map->get (root)) > - return *leader; > + { > + if (SLP_TREE_CODE (root) == VEC_PERM_EXPR) > + return NULL; But this will then only optimize the first occurance. Wouldn't it be better to increase the refcount at load_map->put (root, node); and walk load_map at the end, releasing refs owned by it like we do for bst_map? > + else > + return *leader; > + } > > load_map->put (root, NULL); > > > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)