Re: [PATCH] tree-optimization/113073 - amend PR112736 fix

Richard Biener Wed, 20 Dec 2023 05:45:51 -0800

On Wed, 20 Dec 2023, Richard Biener wrote:

> On Wed, 20 Dec 2023, Thomas Schwinge wrote:
> 
> > Hi!
> > 
> > On 2023-12-19T13:30:58+0100, Richard Biener <rguent...@suse.de> wrote:
> > > The PR112736 testcase fails on RISC-V because the aligned exception
> > > uses the wrong check.  The alignment support scheme can be
> > > dr_aligned even when the access isn't aligned to the vector size
> > > but some targets are happy with element alignment.  The following
> > > fixes that.
> > >
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
> > 
> > I've noticed this to regresses GCN target as follows:
> > 
> >     PASS: gcc.dg/vect/bb-slp-pr78205.c (test for excess errors)
> >     PASS: gcc.dg/vect/bb-slp-pr78205.c scan-tree-dump-times slp2 
> > "optimized: basic block" 3
> >     PASS: gcc.dg/vect/bb-slp-pr78205.c scan-tree-dump-times slp2 "BB 
> > vectorization with gaps at the end of a load is not supported" 1
> >     [-PASS:-]{+FAIL:+} gcc.dg/vect/bb-slp-pr78205.c scan-tree-dump-times 
> > optimized " = c\\[4\\];" 1
> > 
> > As so often, I've got no clue whether that's a vectorizer, GCN back end,
> > or test case issue.  ;-)
> > 
> > 'diff'ing before vs. after:
> > 
> >     --- bb-slp-pr78205.c.191t.slp2        2023-12-20 09:49:45.834344620 
> > +0100
> >     +++ bb-slp-pr78205.c.191t.slp2        2023-12-20 09:10:14.706300941 
> > +0100
> >     [...]
> >     @@ -505,8 +505,9 @@
> >      [...]/bb-slp-pr78205.c:9:8: note: create vector_type-pointer variable 
> > to type: vector(4) double  vectorizing a pointer ref: c[0]
> >      [...]/bb-slp-pr78205.c:9:8: note: created &c[0]
> >      [...]/bb-slp-pr78205.c:9:8: note: add new stmt: vect__1.7_19 = MEM 
> > <vector(4) double> [(double *)&c];
> >     -[...]/bb-slp-pr78205.c:9:8: note: add new stmt: vect__1.8_20 = MEM 
> > <vector(4) double> [(double *)&c + 32B];
> >     -[...]/bb-slp-pr78205.c:9:8: note: add new stmt: vect__1.9_21 = 
> > VEC_PERM_EXPR <vect__1.7_19, vect__1.7_19, { 0, 1, 0, 1 }>;
> >     +[...]/bb-slp-pr78205.c:9:8: note: add new stmt: _20 = MEM[(double *)&c 
> > + 32B];
> >     +[...]/bb-slp-pr78205.c:9:8: note: add new stmt: vect__1.8_21 = {_20, 
> > 0.0, 0.0, 0.0};
> >     +[...]/bb-slp-pr78205.c:9:8: note: add new stmt: vect__1.9_22 = 
> > VEC_PERM_EXPR <vect__1.7_19, vect__1.7_19, { 0, 1, 0, 1 }>;
> >      [...]/bb-slp-pr78205.c:9:8: note: ------>vectorizing SLP node starting 
> > from: a[0] = _1;
> >      [...]/bb-slp-pr78205.c:9:8: note: vect_is_simple_use: operand c[0], 
> > type of def: internal
> >      [...]/bb-slp-pr78205.c:9:8: note: vect_is_simple_use: operand c[1], 
> > type of def: internal
> >     [...]
> >     @@ -537,9 +538,10 @@
> >      [...]/bb-slp-pr78205.c:13:8: note: transform load. ncopies = 1
> >      [...]/bb-slp-pr78205.c:13:8: note: create vector_type-pointer variable 
> > to type: vector(4) double  vectorizing a pointer ref: c[2]
> >      [...]/bb-slp-pr78205.c:13:8: note: created &c[2]
> >     -[...]/bb-slp-pr78205.c:13:8: note: add new stmt: vect__3.14_23 = MEM 
> > <vector(4) double> [(double *)&c];
> >     -[...]/bb-slp-pr78205.c:13:8: note: add new stmt: vect__3.15_24 = MEM 
> > <vector(4) double> [(double *)&c + 32B];
> >     -[...]/bb-slp-pr78205.c:13:8: note: add new stmt: vect__1.16_25 = 
> > VEC_PERM_EXPR <vect__3.14_23, vect__3.14_23, { 2, 3, 2, 3 }>;
> >     +[...]/bb-slp-pr78205.c:13:8: note: add new stmt: vect__3.14_24 = MEM 
> > <vector(4) double> [(double *)&c];
> >     +[...]/bb-slp-pr78205.c:13:8: note: add new stmt: _25 = MEM[(double 
> > *)&c + 32B];
> >     +[...]/bb-slp-pr78205.c:13:8: note: add new stmt: vect__3.15_26 = {_25, 
> > 0.0, 0.0, 0.0};
> >     +[...]/bb-slp-pr78205.c:13:8: note: add new stmt: vect__1.16_27 = 
> > VEC_PERM_EXPR <vect__3.14_24, vect__3.14_24, { 2, 3, 2, 3 }>;
> >      [...]/bb-slp-pr78205.c:13:8: note: ------>vectorizing SLP node 
> > starting from: b[0] = _3;
> >      [...]/bb-slp-pr78205.c:13:8: note: vect_is_simple_use: operand c[2], 
> > type of def: internal
> >      [...]/bb-slp-pr78205.c:13:8: note: vect_is_simple_use: operand c[3], 
> > type of def: internal
> >     [...]
> >     @@ -580,18 +582,22 @@
> >        double _4;
> >        double _5;
> >        vector(2) double _17;
> >     +  double _20;
> >     +  double _25;
> > 
> >        <bb 2> [local count: 1073741824]:
> >        vect__1.7_19 = MEM <vector(4) double> [(double *)&c];
> >     -  vect__1.9_21 = VEC_PERM_EXPR <vect__1.7_19, vect__1.7_19, { 0, 1, 0, 
> > 1 }>;
> >     +  _20 = MEM[(double *)&c + 32B];
> >     +  vect__1.9_22 = VEC_PERM_EXPR <vect__1.7_19, vect__1.7_19, { 0, 1, 0, 
> > 1 }>;
> >        _1 = c[0];
> >        _2 = c[1];
> >     -  MEM <vector(4) double> [(double *)&a] = vect__1.9_21;
> >     -  vect__3.14_23 = MEM <vector(4) double> [(double *)&c];
> >     -  vect__1.16_25 = VEC_PERM_EXPR <vect__3.14_23, vect__3.14_23, { 2, 3, 
> > 2, 3 }>;
> >     +  MEM <vector(4) double> [(double *)&a] = vect__1.9_22;
> >     +  vect__3.14_24 = MEM <vector(4) double> [(double *)&c];
> >     +  _25 = MEM[(double *)&c + 32B];
> 
> that looks like a noop (but it's odd we keep the unused load)
> 
> >     +  vect__1.16_27 = VEC_PERM_EXPR <vect__3.14_24, vect__3.14_24, { 2, 3, 
> > 2, 3 }>;
> >        _3 = c[2];
> >        _4 = c[3];
> >     -  MEM <vector(4) double> [(double *)&b] = vect__1.16_25;
> >     +  MEM <vector(4) double> [(double *)&b] = vect__1.16_27;
> >        _5 = c[4];
> >        _17 = {_5, _5};
> >        MEM <vector(2) double> [(double *)&x] = _17;
> > 
> >     --- bb-slp-pr78205.c.265t.optimized   2023-12-20 09:49:45.838344586 
> > +0100
> >     +++ bb-slp-pr78205.c.265t.optimized   2023-12-20 09:10:14.706300941 
> > +0100
> >     @@ -6,17 +6,17 @@
> >        vector(4) double vect__1.16;
> >        vector(4) double vect__1.9;
> >        vector(4) double vect__1.7;
> >     -  double _5;
> >        vector(2) double _17;
> >     +  double _20;
> > 
> >        <bb 2> [local count: 1073741824]:
> >        vect__1.7_19 = MEM <vector(4) double> [(double *)&c];
> >     -  vect__1.9_21 = VEC_PERM_EXPR <vect__1.7_19, vect__1.7_19, { 0, 1, 0, 
> > 1 }>;
> >     -  MEM <vector(4) double> [(double *)&a] = vect__1.9_21;
> >     -  vect__1.16_25 = VEC_PERM_EXPR <vect__1.7_19, vect__1.7_19, { 2, 3, 
> > 2, 3 }>;
> >     -  MEM <vector(4) double> [(double *)&b] = vect__1.16_25;
> >     -  _5 = c[4];
> >     -  _17 = {_5, _5};
> >     +  _20 = MEM[(double *)&c + 32B];
> 
> that looks similar in the end, but trades c[4] for MEM here.  That's
> because we CSEd c[4] with the "dead" load above.
> 
> I think we could simply remove that extra scan-tree-dump-times, we
> should instead look to _not_ see a vector load starting at c[4],
> but scan-tree-dump-not is also not very reliable (in a false negative
> sense).
> 
> I'll see what that "dead" load is.


Should be fixed by r14-6748-ga8f0278ade1353

Richard.

Re: [PATCH] tree-optimization/113073 - amend PR112736 fix

Reply via email to