I have some code of the form const int primes[] = {7,11,13,17,19}; const int nprimes = sizeof(primes)/sizeof(int);
and later an inmost loop of the form bool happy = true; for (int i=0; i<nprimes && happy; i++) { int j = dat%primes[i]; if (filter[i][j]==1) happy=false; } If I look at the assembly code generated on x86_64 with g++ -O9 -funroll-all-loops I have a sequence of explicit 'idiv %r13d' instructions, where %r13d is initialised to 13 at the start of the function and never changed thereafter. On Core2, idiv by a constant is much slower than the multiply-by-reciprocal sequence which gcc generates when it recognises that it's doing division by a constant, so the program speeds up by a factor three if I replace the loop by if (filter[0][dat%7]==1) happy=false; if (happy && filter[1][dat%11]==1) happy=false; if (happy && filter[2][dat%13]==1) happy=false; if (happy && filter[3][dat%17]==1) happy=false; if (happy && filter[4][dat%19]==1) happy=false; and the generated assembly code contains no divide instructions. Is there any combination of -f options that causes division-by-constant to be recognised after the loop-unrolling stage? Tom