[zebes:~] astrange% /usr/local/bin/g++fsf -v Using built-in specs. Target: powerpc-apple-darwin7.7.0 Configured with: ../fsfgcc/configure --program-suffix=fsf --enable-languages=c,c++,java,treelang -- enable-cpu=750 Thread model: posix gcc version 4.1.0 20050321 (experimental)
[zebes:~] astrange% /usr/local/bin/g++fsf -fdump-tree-gimple -fdump-tree-optimized -Os -mmultiple -S -mcpu=750 -mtune=750 -dp ccset.cpp The source is unsigned int ccblah(unsigned int *q,unsigned int B,unsigned int D,unsigned int E,unsigned int F,unsigned int H) { bool cfold1, cfold2, cfold3, cfold4, cfold5, cfold6; cfold1 = D == B; cfold2 = B == F; cfold3 = D == H; cfold4 = F == H; cfold5 = !cfold1 && !cfold4; cfold6 = !cfold3 && !cfold2; q[0] = cfold1 && cfold6 ? D : E; q[1] = cfold2 && cfold5 ? F : E; q[2] = cfold3 && cfold5 ? D : E; q[3] = cfold4 && cfold6 ? F : E; return ((cfold1) ^ (cfold2))?E:F; } Except for the return, it's a simplified testcase of http://scale2x.sourceforge.net/'s core. GCC generates this for the first block: __Z7ccblahPjjjjjj: LFB3: xor r12,r5,r4 ; 30 *rs6000.md:11675/1 [length = 12] subfic r0,r12,0 adde. r12,r0,r12 xor r10,r7,r8 ; 27 *rs6000.md:11645/1 [length = 12] subfic r0,r10,0 adde r10,r0,r10 xor r4,r4,r7 ; 19 *rs6000.md:11645/1 [length = 12] subfic r0,r4,0 adde r4,r0,r4 xor r11,r5,r8 ; 23 *rs6000.md:11645/1 [length = 12] subfic r0,r11,0 adde r11,r0,r11 beq- cr0,L27 ; 31 *rs6000.md:13791 [length = 4] li r2,0 ; 34 *movsi_internal1/5 [length = 4] b L29 ; 151 jump [length = 4] The xor/subfic/adde patterns are unnecessary; instead, it should be using cmpw into four different CR subregisters ("cmpw cr1,r5,r4" "cmpw cr2,r7,r8" etc). For the return statement, GCC generates: L47: cmpw cr7,r30,r4 ; 127 *cmpsi_internal1 [length = 4] stw r0,12(r11) ; 125 *movsi_internal1/4 [length = 4] bne+ cr7,L48 ; 128 *rs6000.md:13791 [length = 4] mr r3,r7 ; 131 *movsi_internal1/1 [length = 4] L48: lmw r30,-8(r1) ; 169 *lmw [length = 4] blr ; 170 *return_internal_si [length = 4] There may be a possible savings by using crxor instead of another cmpw. -- Summary: PowerPC - inefficient use of condition register Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: enhancement Priority: P2 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: astrange at ithinksw dot com CC: gcc-bugs at gcc dot gnu dot org GCC target triplet: powerpc-apple-darwin7.7.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20614