It seems the auto-vectorizer could not recognize that this loop will roll at most 3 times. And it will generate quite messy code.
int a[1024], b[1024]; void foo (int n) { int i; for (i = (n/4)*4; i< n; i++) a[i] = a[i] + b[i]; } How can we correctly estimate the number of iterations for this case and use this info for the vectorizer? Thanks, Changpeng