https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68892
Bug ID: 68892 Summary: [6 Regression] Excessive dead loads produced by BB vectorization Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- double a[1024][1024]; double b[1024]; void foo(void) { b[0] = a[0][0]; b[1] = a[1][0]; b[2] = a[2][0]; b[3] = a[3][0]; } is vectorized using t.c:10:1: note: Load permutation 0 1024 2048 3072 t.c:10:1: note: Final SLP tree for instance: t.c:10:1: note: node t.c:10:1: note: stmt 0 b[0] = _2; t.c:10:1: note: stmt 1 b[1] = _4; t.c:10:1: note: stmt 2 b[2] = _6; t.c:10:1: note: stmt 3 b[3] = _8; t.c:10:1: note: node t.c:10:1: note: stmt 0 _2 = a[0][0]; t.c:10:1: note: stmt 1 _4 = a[1][0]; t.c:10:1: note: stmt 2 _6 = a[2][0]; t.c:10:1: note: stmt 3 _8 = a[3][0]; where our "stupid" load permutation support first loads all vectors of the group (of size 3073) and then permutes it, using only 4 vectors of it. For vectors with more than two elements the "need more than two vectors" part of load permutation support "fixes" this but for two elements nothing prevents this stupidity (it's all dead code but IVOPTs for example can take ages processing the dead loads) <bb 2>: _2 = a[0][0]; _4 = a[1][0]; _6 = a[2][0]; vect__2.5_10 = MEM[(double *)&a]; _11 = &a[0][0] + 16; vect__2.6_12 = MEM[(double *)_11]; _13 = _11 + 16; vect__2.7_14 = MEM[(double *)_13]; _15 = _13 + 16; vect__2.8_16 = MEM[(double *)_15]; ... _3079 = _3077 + 16; vect__2.1540_3080 = MEM[(double *)_3079]; _3081 = _3079 + 16; vect__2.1541_3082 = MEM[(double *)_3081]; _3083 = _3081 + 18446744073709551608; vect__2.1542_3084 = VEC_PERM_EXPR <vect__2.5_10, vect__2.517_1034, { 0, 2 }>; vect__2.1543_3085 = VEC_PERM_EXPR <vect__2.1029_2058, vect__2.1541_3082, { 0, 2 }>; _8 = a[3][0]; MEM[(double *)&b] = vect__2.1542_3084; _3087 = &b[0] + 16; MEM[(double *)_3087] = vect__2.1543_3085; return;