>> It seems the auto-vectorizer could not recognize that this loop will >> roll at most 3 times. >> And it will generate quite messy code. >> >> int a[1024], b[1024]; >> void foo (int n) >> { >> int i; >> for (i = (n/4)*4; i< n; i++) >> a[i] = a[i] + b[i]; >> } >> >> How can we correctly estimate the number of iterations for this case >> and use this info for the vectorizer?
>Does it recognise it if you rewrite the loop as follows: >for (i = n&~0x3; i< n; i++) > a[i] = a[i] + b[i]; NO. But it is OK for the following case: for (i = n-3; i< n; i++) a[i] = a[i] + b[i]; It seems it fails at the case of "unknown but small". Anyway, this mostly affects compilation time and code size, and has limited impact on performance. For for (i = n&~0x3; i< n; i++) a[i] = a[i] + b[i]; The attached foo-O3-no-tree-vectorize.s is what we expect from the optimizer. foo-O3.s is too bad. Thanks, Changpeng
foo-O3-no-tree-vectorize.s
Description: foo-O3-no-tree-vectorize.s
foo-O3.s
Description: foo-O3.s