Hi, I run some tests of simple number-crunching loops whenever new architectures and compilers arise.
These tests on recent Intel architectures show similar performance between gcc and icc compilers, at full optimization. However a recent test on x86_64 showed the open64 compiler outstripping gcc by a factor of 2 to 3. I tried all the obvious flags; nothing helped. Versions: gcc 4.5.2, Open64 4.2.5. AMD Phenom(tm) II X4 840 Processor. A peek in the assembler makes it clear though. Even with -O3, gcc is not unrolling loops in this code, but opencc does, and profits. Attached find the C file. It's not pretty but the guts are in the small routine double_array_mults_by_const(). For your convenience, also attached is the assembler for the innermost loop, generated by the two compilers with the -S flag. ----------------------------------------------------------------------- Building and running: $ gcc --std=c99 -O3 -Wall -pedantic mults_by_const.c $ ./a.out double array mults by const 450 ms [ 1.013193] $ opencc -std=c99 -O3 -Wall mults_by_const.c $ ./a.out double array mults by const 170 ms [ 1.013193] ----------------------------------------------------------------------- Now, the gcc -O3 should have turned on loop unrolling. I tried turning it on explicitly without success. By the way, I also tried. No difference. -march=native and -ffast-math did not affect the time at all. Cheers!
#ifdef __ICC #include <mathimf.h> #else #include <math.h> #endif /* timer stuff ------------------------------------------------ */ #include <sys/time.h> #include <stdio.h> #define __USE_XOPEN2K 1 #include <stdlib.h> #include <sys/resource.h> static const int who = RUSAGE_SELF; static struct rusage local; static time_t tv_sec; #define START_CLOCK() getrusage(who, &local) #define MS_SINCE( ) ( tv_usec = local.ru_utime.tv_usec, tv_sec = local.ru_utime.tv_sec, \ getrusage( who, &local), \ (long)( ( local.ru_utime.tv_sec - tv_sec ) * 1000 \ + ( local.ru_utime.tv_usec - tv_usec ) / 1000 ) ) #ifdef __suseconds_t_defined static suseconds_t tv_usec; #else static long tv_usec; #endif /* test parameters ------------------------------------------------ */ enum { ITERATIONS = 131072, size = 8192 }; static void double_array_mults_by_const( double dvec[] ); int main( int argc, char *argv[] ) { double * restrict dvec = 0; void **dvecptr = (void **)&dvec; if( 0 == posix_memalign( dvecptr, 16, size * sizeof(double) ) ) { double_array_mults_by_const( dvec ); } return 0; } void double_array_mults_by_const( double * restrict dvec ) { long i, j; const double dval = 1.0000001; for( i = 0; i < size; i++ ) dvec[i] = 1.0; START_CLOCK(); for( j = 0; j < ITERATIONS; j++ ) for( i = 0; i < size; i++ ) dvec[i] *= dval; printf( "%-38s %4ld ms [%10.6f]\n", "double array mults by const", MS_SINCE(), dvec[0] ); }
gcc.asm
Description: Binary data
opencc.asm
Description: Binary data