>> It seems  the auto-vectorizer could not recognize that this loop will
>> roll at most 3 times.
>> And it will generate quite messy code.
>>
>> int a[1024], b[1024];
>> void foo (int n)
>> {
>>   int i;
>>   for (i = (n/4)*4; i< n; i++)
>>     a[i] =  a[i] +  b[i];
>> }
>>
>> How can we correctly estimate the number of iterations for this case
>> and use this info for the vectorizer?

>Does it recognise it if you rewrite the loop as follows:

>for (i = n&~0x3; i< n; i++)
 >    a[i] =  a[i] +  b[i];

NO.  

But it is OK for the following case:

 for (i = n-3; i< n; i++)
     a[i] =  a[i] +  b[i];

It seems it fails at the case of "unknown but small". Anyway, this mostly
affects compilation time and code size, and has limited impact on 
performance.

For
for (i = n&~0x3; i< n; i++)
    a[i] =  a[i] +  b[i]; 

The attached foo-O3-no-tree-vectorize.s is what we expect from the optimizer.
foo-O3.s is too bad.

Thanks,

Changpeng


 

Attachment: foo-O3-no-tree-vectorize.s
Description: foo-O3-no-tree-vectorize.s

Attachment: foo-O3.s
Description: foo-O3.s

Reply via email to