------- Comment #24 from victork at gcc dot gnu dot org 2008-02-11 12:23 ------- Hi,
Here are some more of my observations. 1. For some unclear reason there is indeed no much difference between vectorized and non-vectorized versions for long runs like "time ./TestNoVec 92200 8 89720 1000", but the difference is much more apparent for more short runs: [EMAIL PROTECTED]:~> time ./mnovec 30000 8 29720 1000 real 0m1.738s user 0m1.723s sys 0m0.004s [EMAIL PROTECTED]:~> time ./mvec 30000 8 29720 1000 real 0m0.781s user 0m0.778s sys 0m0.003s 2. If you replace the new() by malloc() it helps to static dependence analysis to prove independence between pSum, pSum1 and pVec1 at compile time, so the run-time versioning is not required. 3. If we leave allocation of buffers by new(), then compiler uses "versioning for alias" and this forces the use of versioning for alignment used to prove right alignment of store to pVec1. This is less optimal than loop peeling, since the vectorized version of loop is executed only for values of itBegin which is multiple of 4. Here is the vesion of your program I used to get above results: #include <iostream> #include <stdio.h> #include <stdlib.h> typedef float ARRTYPE; int main ( int argc, char *argv[] ) { int m_nSamples = atoi( argv[1] ); int itBegin = atoi( argv[2] ); int itEnd = atoi( argv[3] ); int iSizeMain = atoi( argv[ 4 ] ); ARRTYPE *pSum1 = (ARRTYPE*) malloc (sizeof(ARRTYPE) *100000); ARRTYPE *pSum = (ARRTYPE*) malloc (sizeof(ARRTYPE) *100000); for ( int it = 0; it < m_nSamples; it++ ) { pSum[ it ] = it / itBegin; pSum1[ it ] = itBegin / ( it + 1 ); } ARRTYPE *pVec1 = (ARRTYPE*) malloc (sizeof(ARRTYPE) *m_nSamples); for ( int i = 0, j = 0; i < m_nSamples - 5; i++ ) { for( int it = itBegin; it < itEnd; it++ ) pVec1[ it ] += pSum[ it ] + pSum1[ it ]; } free( pVec1 ); } [EMAIL PROTECTED]:~> $g -O3 -fno-tree-vectorize -m64 -o mnovec m.c [EMAIL PROTECTED]:~> $g -O3 -fdump-tree-vect-details -ftree-vectorize -maltivec -m64 -o mvec m.c [EMAIL PROTECTED]:~> time ./mnovec 30000 1 29720 1000 real 0m1.754s user 0m1.750s sys 0m0.003s [EMAIL PROTECTED]:~> time ./mvec 30000 1 29720 1000 real 0m0.781s user 0m0.778s sys 0m0.003s -- Victor -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117