Given this piece of code (gcc-4.7-20120114):
static void Test(Batch* block,Batch* new_block,const uint32 offs)
{
T* __restrict old_values
=(T*)__builtin_assume_aligned(block->items,16);
T* __restrict new_values
=(T*)__builtin_assume_aligned(new_block->items,16);
//assert(((uint64)(&block->items)%16)==0); //OK!!
//assert(((uint64)(&new_block->items)%16)==0);
for(uint32 c=0;c<(BS<<1);c++) //hopefully compiler applies SIMD
here
{
new_values[c]=old_values[c]*old_values[c];
}
}
I would assume that the loop is always vectorized (pointers tagged as
restricted and aligned, loop
over fixed iteration space even a power of 2, so most likely dividable
by 4), it is quite similar to vectorization example22
(http://gcc.gnu.org/projects/tree-ssa/vectorization.html#vectorizab).
I run the previously mentioned g++ version with this command line:
-std=c++0x -g -O3 -msse -msse2 -msse3 -msse4.1 -Wall -Wstrict-aliasing=2
-ftree-vectorizer-verbose=2
Looking at the vectorizer output (and at the generated assembly) it
looks as if the loop given above
is indeed vectorized if Test() is called from main() (vectorized 1 loop).
When the function Test() is called nested inside some complex code, it
looks as if the vectorization analysis gives up because the code is too
complex to analyze and never considers the loop inside Test() in this
context even though it should be easily vectorizeable in any context
given the hints inside Test().
Is there anything I can do, so that Test() is analyzed in all contexts?
I guess all methods that contain the
__builtin_assume_aligned hint should be considered for vectorization,
independent of their context.
Thx for your help,
Alex