Basically GCC reports that the loop is vectorized, but the vectorized loop is never executed because of the bogus alias check ' a + 16 < a' generated. In trunk, the vectorized version is eliminated, but it remains as a dead code with gcc 48.
David On Mon, Sep 23, 2013 at 5:26 PM, Cong Hou <co...@google.com> wrote: > (I have also created this issue in bug reports: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58513) > > > First look at the code below: > > > int op(const int* a, const int* b) > { return *a+*b; } > > void foo(int*a, int b) > { > int i; > for (i = 0; i < 100000; ++i) > a[i] = op(a+i, &b); > } > > > GCC will generate the following GIMPLE for this loop after inlining op(): > > > <bb 4>: > # i_15 = PHI <i_9(3), 0(2)> > # ivtmp_23 = PHI <ivtmp_22(3), 100000(2)> > _4 = (long unsigned int) i_15; > _5 = _4 * 4; > _7 = a_6(D) + _5; > _10 = MEM[(const int *)_7]; > _11 = _10 + b_12(D); > *_7 = _11; > i_9 = i_15 + 1; > ivtmp_22 = ivtmp_23 - 1; > if (ivtmp_22 != 0) > goto <bb 3>; > else > goto <bb 5>; > > > > Here each element of the array a is loaded by MEM[(const int *)_7] and > stored by *_7, which are the only two data refs in the loop body. The > GCC vectorizer needs to check the possible aliasing between data refs > with potential data dependence. Here those two data refs are actually > the same one, but GCC could not recognize this fact. As a result, the > aliasing checking predicate will always return false at runtime (GCC > 4.9 could eliminate this generated branch at the end of the > vectorization pass). > > The reason why GCC thinks that MEM[(const int *)_7] and *_7 are two > different data refs is that there is a possible defect in the function > operand_equal_p(), which is used to compare two data refs. The current > implementation uses == to compare the types of the second argument of > MEM_REF operator, which is too strict. Using types_compatible_p() > instead can fix the issue above. I also included a test case for this > bug fix. Bootstrapping and "make check" are both passed. > > > thanks, > Cong > > > > Index: gcc/testsuite/gcc.dg/alias-14.c > =================================================================== > --- gcc/testsuite/gcc.dg/alias-14.c (revision 0) > +++ gcc/testsuite/gcc.dg/alias-14.c (revision 0) > @@ -0,0 +1,24 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -ftree-vectorize" } */ > + > +int op (const int* x, const int* y) > +{ > + return *x + *y; > +} > + > +/* After inlining op() the type of the data ref is converted from int* into > + const int&, resulting in two data refs MEM[(const int *)DR] and *DR for > read > + and write, where DR represents the address of a[i] here. They are still > + the same data ref and no alias exists in the loop. The vectorizer should > + succesffuly vectorize this loop. */ > + > +void foo(int* a, int b) > +{ > + int i; > + for (i = 0; i < 100000; ++i) > + a[i] = op(a + i, &b); > +} > + > + > +/* { dg-final { scan-assembler-times "paddd" 1 { target x86_64-*-* } } } */ > + > Index: gcc/fold-const.c > =================================================================== > --- gcc/fold-const.c (revision 202662) > +++ gcc/fold-const.c (working copy) > @@ -2693,8 +2693,9 @@ operand_equal_p (const_tree arg0, const_ > && operand_equal_p (TYPE_SIZE (TREE_TYPE (arg0)), > TYPE_SIZE (TREE_TYPE (arg1)), flags))) > && types_compatible_p (TREE_TYPE (arg0), TREE_TYPE (arg1)) > - && (TYPE_MAIN_VARIANT (TREE_TYPE (TREE_OPERAND (arg0, 1))) > - == TYPE_MAIN_VARIANT (TREE_TYPE (TREE_OPERAND (arg1, 1)))) > + && types_compatible_p ( > + TYPE_MAIN_VARIANT (TREE_TYPE (TREE_OPERAND (arg0, 1))), > + TYPE_MAIN_VARIANT (TREE_TYPE (TREE_OPERAND (arg1, 1)))) > && OP_SAME (0) && OP_SAME (1)); > > case ARRAY_REF: