Hi, I run some tests of simple number-crunching loops whenever new architectures and compilers arise.
These tests on recent Intel architectures show similar performance
between gcc and icc compilers, at full optimization.
However a recent test on x86_64 showed the open64 compiler
outstripping gcc by a factor of 2 to 3. I tried all the obvious
flags; nothing helped.
Versions: gcc 4.5.2, Open64 4.2.5. AMD Phenom(tm) II X4 840 Processor.
A peek in the assembler makes it clear though. Even with -O3, gcc is
not unrolling loops in this code, but opencc does, and profits.
Attached find the C file. It's not pretty but the guts are in the
small routine double_array_mults_by_const().
For your convenience, also attached is the assembler for the innermost
loop, generated by the two compilers with the -S flag.
-----------------------------------------------------------------------
Building and running:
$ gcc --std=c99 -O3 -Wall -pedantic mults_by_const.c
$ ./a.out
double array mults by const 450 ms [ 1.013193]
$ opencc -std=c99 -O3 -Wall mults_by_const.c
$ ./a.out
double array mults by const 170 ms [ 1.013193]
-----------------------------------------------------------------------
Now, the gcc -O3 should have turned on loop unrolling. I tried turning
it on explicitly without success.
By the way, I also tried. No difference.
-march=native
and
-ffast-math
did not affect the time at all.
Cheers!
#ifdef __ICC
#include <mathimf.h>
#else
#include <math.h>
#endif
/* timer stuff ------------------------------------------------ */
#include <sys/time.h>
#include <stdio.h>
#define __USE_XOPEN2K 1
#include <stdlib.h>
#include <sys/resource.h>
static const int who = RUSAGE_SELF;
static struct rusage local;
static time_t tv_sec;
#define START_CLOCK() getrusage(who, &local)
#define MS_SINCE( ) ( tv_usec = local.ru_utime.tv_usec, tv_sec = local.ru_utime.tv_sec, \
getrusage( who, &local), \
(long)( ( local.ru_utime.tv_sec - tv_sec ) * 1000 \
+ ( local.ru_utime.tv_usec - tv_usec ) / 1000 ) )
#ifdef __suseconds_t_defined
static suseconds_t tv_usec;
#else
static long tv_usec;
#endif
/* test parameters ------------------------------------------------ */
enum {
ITERATIONS = 131072,
size = 8192
};
static void double_array_mults_by_const( double dvec[] );
int
main( int argc, char *argv[] )
{
double * restrict dvec = 0;
void **dvecptr = (void **)&dvec;
if( 0 == posix_memalign( dvecptr, 16, size * sizeof(double) ) )
{
double_array_mults_by_const( dvec );
}
return 0;
}
void
double_array_mults_by_const( double * restrict dvec )
{
long i, j;
const double dval = 1.0000001;
for( i = 0; i < size; i++ )
dvec[i] = 1.0;
START_CLOCK();
for( j = 0; j < ITERATIONS; j++ )
for( i = 0; i < size; i++ )
dvec[i] *= dval;
printf( "%-38s %4ld ms [%10.6f]\n",
"double array mults by const", MS_SINCE(), dvec[0] );
}
gcc.asm
Description: Binary data
opencc.asm
Description: Binary data
