On Tue, 23 Feb 2021, Tamar Christina wrote:

> Hi Richi,
> 
> The attached testcase shows a bug where two nodes end up with the same 
> pointer.
> During the loop that analyzes all the instances
> in optimize_load_redistribution_1 we do
> 
>       if (value)
>         {
>           SLP_TREE_REF_COUNT (value)++;
>           SLP_TREE_CHILDREN (root)[i] = value;
>           vect_free_slp_tree (node);
>         }
> 
> when doing a replacement.  When this is done and the refcount for the node
> reaches 0, the node is removed, which allows the libc to return the pointer
> again in the next call to new, which it does..
> 
> First instance
> 
> note:   node 0x5325f48 (max_nunits=1, refcnt=2)
> note:   op: VEC_PERM_EXPR
> note:           { }
> note:           lane permutation { 0[0] 1[1] 0[2] 1[3] }
> note:           children 0x5325db0 0x5325200
> 
> Second instance
> 
> note:   node 0x5325f48 (max_nunits=1, refcnt=1)
> note:   op: VEC_PERM_EXPR
> note:           { }
> note:           lane permutation { 0[0] 1[1] }
> note:           children 0x53255b8 0x5325530
> 
> This will end up with the illegal construction of
> 
> note:   node 0x53258e8 (max_nunits=2, refcnt=2)
> note:   op template: slp_patt_57 = .COMPLEX_MUL (_16, _16);
> note:           stmt 0 _16 = _14 - _15;
> note:           stmt 1 _23 = _17 + _22;
> note:           children 0x53257d8 0x5325d28
> note:   node 0x53257d8 (max_nunits=2, refcnt=3)
> note:   op template: l$b_4 = MEM[(const struct a &)_3].b;
> note:           stmt 0 l$b_4 = MEM[(const struct a &)_3].b;
> note:           stmt 1 l$c_5 = MEM[(const struct a &)_3].c;
> note:           load permutation { 0 1 }
> note:   node 0x5325d28 (max_nunits=2, refcnt=8)
> note:   op template: l$b_4 = MEM[(const struct a &)_3].b;
> note:           stmt 0 l$b_4 = MEM[(const struct a &)_3].b;
> note:           stmt 1 l$c_5 = MEM[(const struct a &)_3].c;
> note:           stmt 2 l$b_4 = MEM[(const struct a &)_3].b;
> note:           stmt 3 l$c_5 = MEM[(const struct a &)_3].c;
> note:           load permutation { 0 1 0 1 }
> 
> To prevent this my initial thought was to add the temporary VEC_PERM_EXPR 
> nodes
> to the bst_map cache and increase their refcnt one more.  However since 
> bst_map
> is gated on scalar statements and these nodes have none we can't do that.
> 
> Instead I realized that load_map is really only a visited list at the top 
> level.
> So instead of returning the reference, we should return NULL.
> 
> What this means is that it will no replacement was found at that level.  This 
> is
> fine since these VEC_PERM_EXPR are single use.  So while the any other node is
> an indication to use the cache, VEC_PERM_EXPR are an indication to avoid it.

I don't understand really.  Waiting for the other patch to be pushed so
I can eventually have a look, but see below.

> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>       PR tree-optimization/99220
>       * tree-vect-slp.c (optimize_load_redistribution_1): Don't use
>       VEC_PERM_EXPR in cache.
> 
> gcc/testsuite/ChangeLog:
> 
>       PR tree-optimization/99220
>       * g++.dg/vect/pr99220.cc: New test.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/testsuite/g++.dg/vect/pr99220.cc 
> b/gcc/testsuite/g++.dg/vect/pr99220.cc
> new file mode 100755
> index 
> 0000000000000000000000000000000000000000..ff3058832b742414202a8ada0a9dafc72c9a54aa
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/vect/pr99220.cc
> @@ -0,0 +1,29 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-w -O3 -march=armv8.3-a" { target { aarch64*-*-* 
> } } } */
> +
> +class a {
> +  float b;
> +  float c;
> +
> +public:
> +  a(float d, float e) : b(d), c(e) {}
> +  a operator+(a d) { return a(b + d.b, c + d.c); }
> +  a operator-(a d) { return a(b - d.b, c - d.c); }
> +  a operator*(a d) { return a(b * b - c * c, b * c + c * d.b); }
> +};
> +long f;
> +a *g;
> +class {
> +  a *h;
> +  long i;
> +  a *j;
> +
> +public:
> +  void k() {
> +    a l = h[0], m = g[i], n = l * g[1], o = l * j[8];
> +    g[i] = m + n;
> +    g[i + 1] = m - n;
> +    j[f] = o;
> +  }
> +} p;
> +main() { p.k(); }
> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
> index 
> 605873714a5cafaaf822f61f1f769f96b3876694..e631463be8fc5b2de355e674a9c96665beb9516c
>  100644
> --- a/gcc/tree-vect-slp.c
> +++ b/gcc/tree-vect-slp.c
> @@ -2292,7 +2292,12 @@ optimize_load_redistribution_1 
> (scalar_stmts_to_slp_tree_map_t *bst_map,
>                               slp_tree root)
>  {
>    if (slp_tree *leader = load_map->get (root))
> -    return *leader;
> +    {
> +      if (SLP_TREE_CODE (root) == VEC_PERM_EXPR)
> +     return NULL;

But this will then only optimize the first occurance.  Wouldn't it be
better to increase the refcount at

      load_map->put (root, node);

and walk load_map at the end, releasing refs owned by it like we do
for bst_map?

> +      else
> +     return *leader;
> +    }
>  
>    load_map->put (root, NULL);
>  
> 
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Reply via email to