I have been looking at a significant performance regression in the hmmer application between GCC 3.4 and GCC 4.0. I have a small cutdown test case (attached) that demonstrates the problem and which runs more than 10% slower on IA64 (HP-UX or Linux) when compiled with GCC 4.0 than when compiled with GCC 3.4. At first I thought this was just due to 'better' alias analysis in the P7Viterbi routine and that it was the right thing to do even if it was slower. It looked like GCC 3.4 does not believe that hmm->tsc could alias mmx but GCC 4.0 thinks they could and thus GCC 4.0 does more loads inside the inner loop of P7Viterbi. But then I noticed something weird, if I remove the field M (which is unused in my example) from the plan_s structure. GCC 4.0 runs as fast as GCC 3.4. I don't understand why this would affect things.
Any optimization experts care to take a look at this test case and help me understand what is going on and if this change from 3.4 to 4.0 is intentional or not? Steve Ellcey [EMAIL PROTECTED] ------------------------ Test Case ----------------------- #define L_CONST 500 void *malloc(long size); struct plan7_s { int M; int **tsc; /* transition scores [0.6][1.M-1] */ }; struct dpmatrix_s { int **mmx; }; struct dpmatrix_s *mx; void AllocPlan7Body(struct plan7_s *hmm, int M) { int i; hmm->tsc = malloc (7 * sizeof(int *)); hmm->tsc[0] = malloc ((M+16) * sizeof(int)); mx->mmx = (int **) malloc(sizeof(int *) * (L_CONST+1)); for (i = 0; i <= L_CONST; i++) { mx->mmx[i] = malloc (M+2+16); } return; } void P7Viterbi(int L, int M, struct plan7_s *hmm, int **mmx) { int i,k; for (i = 1; i <= L; i++) { for (k = 1; k <= M; k++) { mmx[i][k] = mmx[i-1][k-1] + hmm->tsc[0][k-1]; } } } main () { struct plan7_s *hmm; char dsq[L_CONST]; int i; hmm = (struct plan7_s *) malloc (sizeof (struct plan7_s)); mx = (struct dpmatrix_s *) malloc (sizeof (struct dpmatrix_s)); AllocPlan7Body(hmm, 10); for (i = 0; i < 600000; i++) { P7Viterbi(500, 10, hmm, mx->mmx); } }