> Posting some random numbers without a test-case and precise command line > parameters for both compilers makes the numbers useless, IMHO. You also > only mention instruction counts. Have you actually benchmarked the > resulting code? CPUs are complicated and what you might perceive as worse > code might actually be superior thanks to scheduling and internal CPU > parallelism etc.
Thanks for reminding. After some investigation, I could demonstrate the issue by following piece of code: -------------------------------------begin here------------------- extern int *p[5]; # define REAL_RADIX_2 24 # define REAL_MUL_2(x, y) (((long long)(x) * (long long)(y)) >> REAL_RADIX_2) void func(int *b1, int *b2) { int c0 = p[3][0]; int c1 = p[3][1]; b2[0x18] = b1[0x18] + b1[0x1B]; b2[0x1B] = REAL_MUL_2((b1[0x18] - b1[0x1B]) , c0); b2[0x19] = b1[0x19] + b1[0x1A]; b2[0x1A] = REAL_MUL_2((b1[0x19] - b1[0x1A]) , c1); b2[0x1C] = b1[0x1C] + b1[0x1F]; b2[0x1F] = REAL_MUL_2((b1[0x1F] - b1[0x1C]) , c0); b2[0x1D] = b1[0x1D] + b1[0x1E]; b2[0x1E] = REAL_MUL_2((b1[0x1E] - b1[0x1D]) , c1); } -------------------------------------cut here------------------- It seems GCC4.3.4 always expands the long long multiplication into three long multiplications, like -------------------------------------begin here------------------- # b2[0x1A] = REAL_MUL_2((b1[0x19] - b1[0x1A]) , c1); lw $6,104($4) lw $2,100($4) subu $2,$2,$6 mult $11,$2 sra $6,$2,31 madd $6,$9 mflo $6 multu $2,$9 mfhi $3 addu $3,$6,$3 sll $6,$3,8 mflo $2 srl $7,$2,24 or $7,$6,$7 sw $7,104($5) -------------------------------------cut here------------------- while GCC3.4.4 treats the long long multiplication just like simple ones, which generates only one mult insn for each statement, like -------------------------------------begin here------------------- # b2[0x1A] = REAL_MUL_2((b1[0x19] - b1[0x1A]) , c1); lw $2,100($4) lw $7,104($4) subu $3,$2,$7 mult $3,$9 mflo $6 mfhi $25 srl $15,$6,24 sll $24,$25,8 or $14,$15,$24 sw $14,104($5) -------------------------------------cut here------------------- In my understanding, It‘s not necessary using three mult insn to implement long long mult, since the operands are converted from int type. And as before, the compiling options are like "-march=mips32r2 -O3" Thanks. -- Best Regards.