[zebes:~] astrange% /usr/local/bin/g++fsf -v
Using built-in specs.
Target: powerpc-apple-darwin7.7.0
Configured with: ../fsfgcc/configure --program-suffix=fsf 
--enable-languages=c,c++,java,treelang --
enable-cpu=750
Thread model: posix
gcc version 4.1.0 20050321 (experimental)

[zebes:~] astrange% /usr/local/bin/g++fsf -fdump-tree-gimple 
-fdump-tree-optimized -Os 
-mmultiple -S -mcpu=750 -mtune=750 -dp ccset.cpp

The source is
unsigned int ccblah(unsigned int *q,unsigned int B,unsigned int D,unsigned int 
E,unsigned int 
F,unsigned int H)
{
    bool cfold1, cfold2, cfold3, cfold4, cfold5, cfold6;
    cfold1 = D == B;
    cfold2 = B == F;
    cfold3 = D == H;
    cfold4 = F == H;
    cfold5 = !cfold1 && !cfold4;
    cfold6 = !cfold3 && !cfold2;
    q[0] = cfold1 && cfold6 ? D : E;
    q[1] = cfold2 && cfold5 ? F : E;
    q[2] = cfold3 && cfold5 ? D : E;
    q[3] = cfold4 && cfold6 ? F : E;
    return ((cfold1) ^ (cfold2))?E:F;
}

Except for the return, it's a simplified testcase of 
http://scale2x.sourceforge.net/'s core.

GCC generates this for the first block:
__Z7ccblahPjjjjjj:
LFB3:
        xor r12,r5,r4   ; 30    *rs6000.md:11675/1      [length = 12]
        subfic r0,r12,0
        adde. r12,r0,r12
        xor r10,r7,r8   ; 27    *rs6000.md:11645/1      [length = 12]
        subfic r0,r10,0
        adde r10,r0,r10
        xor r4,r4,r7    ; 19    *rs6000.md:11645/1      [length = 12]
        subfic r0,r4,0
        adde r4,r0,r4
        xor r11,r5,r8   ; 23    *rs6000.md:11645/1      [length = 12]
        subfic r0,r11,0
        adde r11,r0,r11
        beq- cr0,L27    ; 31    *rs6000.md:13791        [length = 4]
        li r2,0 ; 34    *movsi_internal1/5      [length = 4]
        b L29   ; 151   jump    [length = 4]

The xor/subfic/adde patterns are unnecessary; instead, it should be using cmpw 
into four different CR 
subregisters ("cmpw cr1,r5,r4" "cmpw cr2,r7,r8" etc).

For the return statement, GCC generates:
L47:
        cmpw cr7,r30,r4 ; 127   *cmpsi_internal1        [length = 4]
        stw r0,12(r11)  ; 125   *movsi_internal1/4      [length = 4]
        bne+ cr7,L48    ; 128   *rs6000.md:13791        [length = 4]
        mr r3,r7        ; 131   *movsi_internal1/1      [length = 4]
L48:
        lmw r30,-8(r1)  ; 169   *lmw    [length = 4]
        blr     ; 170   *return_internal_si     [length = 4]

There may be a possible savings by using crxor instead of another cmpw.

-- 
           Summary: PowerPC - inefficient use of condition register
           Product: gcc
           Version: 4.1.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P2
         Component: rtl-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: astrange at ithinksw dot com
                CC: gcc-bugs at gcc dot gnu dot org
GCC target triplet: powerpc-apple-darwin7.7.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20614

Reply via email to