I tried to simpify the testcase below and ended up with a comlete different testcase, but it causes the same problem:
it seems to be about FPU registers, if anything causes the compiler to store the value to memory, it treats it as it would be volatile.
void otherfunc();
void test(){
double result=0.0; //stored in fpu register
otherfunc(); //"result" saved to memory
for (int j = 1; j < 100000000; ++j) result += 1.0; //"result" read and written to/from memory each cycle
std::cerr << result << std::endl; }
on i386 the inner loops look like this:
without otherfunc(): .L7: decl %eax fadd %st, %st(1) jns .L7
with otherfunc(): .L7: fldl -8(%ebp) decl %eax fadd %st(1), %st //FPU stack ordering problem? fstl -8(%ebp) js .L11 fstp %st(0) jmp .L7 .L11:
I don't really know what keywords to search for in bugzilla, could anyone please look up if this is a known bug?
Mfg,
Benjamin Redelings I schrieb:
Hi,
I have a C++ program that runs slower under 4.0 CVS than 3.4. So, I am trying to make some test-cases that might help deduce the reason. However, when I reduced this testcase sufficiently, it began behaving badly under BOTH 3.4 and 4.0.... but I guess I should start with the most reduced case first.
Basically, the code just does a lot of multiplies and adds. However, if I take the main loop outside of an if-block, it goes 5x faster. Also, if I implement an array as 'double*' instead of 'vector<double>' it also goes 5x faster. Using valarray<double> instead of vector<double> does not give any improvement.
MATH INSIDE IF-BLOCK % time ./2h 1 double addition result = 83283300.006041
real 0m0.995s user 0m1.000s sys 0m0.000s
MATH OUTSIDE IF-BLOCK % time ./2i 1 result = 83283299.999998
real 0m0.218s user 0m0.220s sys 0m0.000s
Should I submit a PR? Any help would be appreciated...
-BenRI
------------ begin testcase ------------- #include <vector>
const int OUTER = 100000; const int INNER = 1000;
using namespace std;
int main(int argn, char *argv[]) { int s = atoi(argv[1]);
double result; if (s == 1) { //remove this condition to get a 5x speedup // initialize d vector<double> d(INNER); //change to double* to get 5x speedup for (int i = 0; i < INNER; i++) d[i] = double(1+i) / INNER;
// calc result result=0; for (int i = 0; i < OUTER; ++i) for (int j = 1; j < INNER; ++j) result += d[j]*d[j-1] + d[j-1]; } else exit(-1);
printf("result = %f\n",result); return 0; } ----------- end testcase --------------
-- Stefan Strasser