On Wed, 20 Dec 2023, Richard Biener wrote: > On Wed, 20 Dec 2023, Thomas Schwinge wrote: > > > Hi! > > > > On 2023-12-19T13:30:58+0100, Richard Biener <rguent...@suse.de> wrote: > > > The PR112736 testcase fails on RISC-V because the aligned exception > > > uses the wrong check. The alignment support scheme can be > > > dr_aligned even when the access isn't aligned to the vector size > > > but some targets are happy with element alignment. The following > > > fixes that. > > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. > > > > I've noticed this to regresses GCN target as follows: > > > > PASS: gcc.dg/vect/bb-slp-pr78205.c (test for excess errors) > > PASS: gcc.dg/vect/bb-slp-pr78205.c scan-tree-dump-times slp2 > > "optimized: basic block" 3 > > PASS: gcc.dg/vect/bb-slp-pr78205.c scan-tree-dump-times slp2 "BB > > vectorization with gaps at the end of a load is not supported" 1 > > [-PASS:-]{+FAIL:+} gcc.dg/vect/bb-slp-pr78205.c scan-tree-dump-times > > optimized " = c\\[4\\];" 1 > > > > As so often, I've got no clue whether that's a vectorizer, GCN back end, > > or test case issue. ;-) > > > > 'diff'ing before vs. after: > > > > --- bb-slp-pr78205.c.191t.slp2 2023-12-20 09:49:45.834344620 > > +0100 > > +++ bb-slp-pr78205.c.191t.slp2 2023-12-20 09:10:14.706300941 > > +0100 > > [...] > > @@ -505,8 +505,9 @@ > > [...]/bb-slp-pr78205.c:9:8: note: create vector_type-pointer variable > > to type: vector(4) double vectorizing a pointer ref: c[0] > > [...]/bb-slp-pr78205.c:9:8: note: created &c[0] > > [...]/bb-slp-pr78205.c:9:8: note: add new stmt: vect__1.7_19 = MEM > > <vector(4) double> [(double *)&c]; > > -[...]/bb-slp-pr78205.c:9:8: note: add new stmt: vect__1.8_20 = MEM > > <vector(4) double> [(double *)&c + 32B]; > > -[...]/bb-slp-pr78205.c:9:8: note: add new stmt: vect__1.9_21 = > > VEC_PERM_EXPR <vect__1.7_19, vect__1.7_19, { 0, 1, 0, 1 }>; > > +[...]/bb-slp-pr78205.c:9:8: note: add new stmt: _20 = MEM[(double *)&c > > + 32B]; > > +[...]/bb-slp-pr78205.c:9:8: note: add new stmt: vect__1.8_21 = {_20, > > 0.0, 0.0, 0.0}; > > +[...]/bb-slp-pr78205.c:9:8: note: add new stmt: vect__1.9_22 = > > VEC_PERM_EXPR <vect__1.7_19, vect__1.7_19, { 0, 1, 0, 1 }>; > > [...]/bb-slp-pr78205.c:9:8: note: ------>vectorizing SLP node starting > > from: a[0] = _1; > > [...]/bb-slp-pr78205.c:9:8: note: vect_is_simple_use: operand c[0], > > type of def: internal > > [...]/bb-slp-pr78205.c:9:8: note: vect_is_simple_use: operand c[1], > > type of def: internal > > [...] > > @@ -537,9 +538,10 @@ > > [...]/bb-slp-pr78205.c:13:8: note: transform load. ncopies = 1 > > [...]/bb-slp-pr78205.c:13:8: note: create vector_type-pointer variable > > to type: vector(4) double vectorizing a pointer ref: c[2] > > [...]/bb-slp-pr78205.c:13:8: note: created &c[2] > > -[...]/bb-slp-pr78205.c:13:8: note: add new stmt: vect__3.14_23 = MEM > > <vector(4) double> [(double *)&c]; > > -[...]/bb-slp-pr78205.c:13:8: note: add new stmt: vect__3.15_24 = MEM > > <vector(4) double> [(double *)&c + 32B]; > > -[...]/bb-slp-pr78205.c:13:8: note: add new stmt: vect__1.16_25 = > > VEC_PERM_EXPR <vect__3.14_23, vect__3.14_23, { 2, 3, 2, 3 }>; > > +[...]/bb-slp-pr78205.c:13:8: note: add new stmt: vect__3.14_24 = MEM > > <vector(4) double> [(double *)&c]; > > +[...]/bb-slp-pr78205.c:13:8: note: add new stmt: _25 = MEM[(double > > *)&c + 32B]; > > +[...]/bb-slp-pr78205.c:13:8: note: add new stmt: vect__3.15_26 = {_25, > > 0.0, 0.0, 0.0}; > > +[...]/bb-slp-pr78205.c:13:8: note: add new stmt: vect__1.16_27 = > > VEC_PERM_EXPR <vect__3.14_24, vect__3.14_24, { 2, 3, 2, 3 }>; > > [...]/bb-slp-pr78205.c:13:8: note: ------>vectorizing SLP node > > starting from: b[0] = _3; > > [...]/bb-slp-pr78205.c:13:8: note: vect_is_simple_use: operand c[2], > > type of def: internal > > [...]/bb-slp-pr78205.c:13:8: note: vect_is_simple_use: operand c[3], > > type of def: internal > > [...] > > @@ -580,18 +582,22 @@ > > double _4; > > double _5; > > vector(2) double _17; > > + double _20; > > + double _25; > > > > <bb 2> [local count: 1073741824]: > > vect__1.7_19 = MEM <vector(4) double> [(double *)&c]; > > - vect__1.9_21 = VEC_PERM_EXPR <vect__1.7_19, vect__1.7_19, { 0, 1, 0, > > 1 }>; > > + _20 = MEM[(double *)&c + 32B]; > > + vect__1.9_22 = VEC_PERM_EXPR <vect__1.7_19, vect__1.7_19, { 0, 1, 0, > > 1 }>; > > _1 = c[0]; > > _2 = c[1]; > > - MEM <vector(4) double> [(double *)&a] = vect__1.9_21; > > - vect__3.14_23 = MEM <vector(4) double> [(double *)&c]; > > - vect__1.16_25 = VEC_PERM_EXPR <vect__3.14_23, vect__3.14_23, { 2, 3, > > 2, 3 }>; > > + MEM <vector(4) double> [(double *)&a] = vect__1.9_22; > > + vect__3.14_24 = MEM <vector(4) double> [(double *)&c]; > > + _25 = MEM[(double *)&c + 32B]; > > that looks like a noop (but it's odd we keep the unused load) > > > + vect__1.16_27 = VEC_PERM_EXPR <vect__3.14_24, vect__3.14_24, { 2, 3, > > 2, 3 }>; > > _3 = c[2]; > > _4 = c[3]; > > - MEM <vector(4) double> [(double *)&b] = vect__1.16_25; > > + MEM <vector(4) double> [(double *)&b] = vect__1.16_27; > > _5 = c[4]; > > _17 = {_5, _5}; > > MEM <vector(2) double> [(double *)&x] = _17; > > > > --- bb-slp-pr78205.c.265t.optimized 2023-12-20 09:49:45.838344586 > > +0100 > > +++ bb-slp-pr78205.c.265t.optimized 2023-12-20 09:10:14.706300941 > > +0100 > > @@ -6,17 +6,17 @@ > > vector(4) double vect__1.16; > > vector(4) double vect__1.9; > > vector(4) double vect__1.7; > > - double _5; > > vector(2) double _17; > > + double _20; > > > > <bb 2> [local count: 1073741824]: > > vect__1.7_19 = MEM <vector(4) double> [(double *)&c]; > > - vect__1.9_21 = VEC_PERM_EXPR <vect__1.7_19, vect__1.7_19, { 0, 1, 0, > > 1 }>; > > - MEM <vector(4) double> [(double *)&a] = vect__1.9_21; > > - vect__1.16_25 = VEC_PERM_EXPR <vect__1.7_19, vect__1.7_19, { 2, 3, > > 2, 3 }>; > > - MEM <vector(4) double> [(double *)&b] = vect__1.16_25; > > - _5 = c[4]; > > - _17 = {_5, _5}; > > + _20 = MEM[(double *)&c + 32B]; > > that looks similar in the end, but trades c[4] for MEM here. That's > because we CSEd c[4] with the "dead" load above. > > I think we could simply remove that extra scan-tree-dump-times, we > should instead look to _not_ see a vector load starting at c[4], > but scan-tree-dump-not is also not very reliable (in a false negative > sense). > > I'll see what that "dead" load is.
Should be fixed by r14-6748-ga8f0278ade1353 Richard.