On Wed, Aug 10, 2011 at 1:40 PM, Richard Guenther <richard.guent...@gmail.com> wrote: > On Wed, Aug 10, 2011 at 12:29 PM, Paulo J. Matos <pa...@matos-sorge.com> > wrote: >> Hi, >> >> I am having a size optimisation issue with GCC-4.6.1. >> The problem boils down to the fact that I have no idea on the best way to >> hint to GCC that a given insn would make more sense someplace else. >> >> The C code is simple: >> int16_t mask(uint32_t a) >> { >> return (x & a) == a; >> } >> >> int16_t is QImode and uint32_t is HImode. >> After combine the insn chain (which is unmodified all the way to ira) is (in >> simplified form): >> regQI 27 <- regQI AH [a] >> regQI 28 <- regQI AL [a+1] >> regQI AL <- andQI(regQI 28, memQI(symbolrefQI(x) + 1)) >> regQI AH <- andQI(regQI 27, memQI(symbolrefQI(x)) >> regQI 30 <- regQI AL >> regQI 29 <- regQI AH >> regQI 24 <- 1 >> if regQI 29 != regQI 27 >> goto labelref 20 >> if regQI 30 != regQI 28 >> goto labelref 20 >> goto labelref 22 >> labelref 20 >> regQI 24 <- 0 >> labelref 22 >> regQI AL <- regQI 24 >> >> The problem resides in `regQI 24 <- 1' being before the jumps. >> Since regQI 24 is going to AL, IRA decides to allocate regQI 24 to AL, which >> creates loads of conflicts and reloads. If that same insn would be moved to >> after the jumps and before the `goto labelref 22' then all would be fine >> cause by then regs 27, 28, 29, 30 are dead. >> >> It's obviously hard to point to a solution but I was wondering if there's a >> way to hint to GCC that moving an insn might help the code issue. Or if I >> should look into a why an existing pass is not already doing that. > > On x86 we expand the code to ((xl & al) ^ al) | ((xh & ah) ^ ah) == 0 > which is then if-converted. Modified testcase: > > long long x; > _Bool __attribute__((regparm(2))) mask (long long a) > { > return (x & a) == a; > } > > on i?86 gets you > > mask: > .LFB0: > .cfi_startproc > pushl %ebx > .cfi_def_cfa_offset 8 > .cfi_offset 3, -8 > movl %eax, %ebx > andl x, %ebx > movl %edx, %ecx > andl x+4, %ecx > xorl %ebx, %eax > xorl %ecx, %edx > orl %edx, %eax > sete %al > popl %ebx > .cfi_restore 3 > .cfi_def_cfa_offset 4 > ret > > so I wonder if you should investigate why the xor variant doesn't trigger > for you? On i?86 if-conversion probably solves your specific issue, > but I guess the initial expansion is where you could improve placement > of the 1 (after all, the 0 is after the jumps).
Oh, and I wonder if/why IRA can/does not rematerialize the constant instead of spilling it. Might be a cost issue that it doesn't delay allocating a reg for 1 as that is cheap to reload (is it?). Richard. > Richard. > >> Cheers, >> >> -- >> PMatos >> >> >