On Wed, Aug 10, 2011 at 1:40 PM, Richard Guenther
<richard.guent...@gmail.com> wrote:
> On Wed, Aug 10, 2011 at 12:29 PM, Paulo J. Matos <pa...@matos-sorge.com> 
> wrote:
>> Hi,
>>
>> I am having a size optimisation issue with GCC-4.6.1.
>> The problem boils down to the fact that I have no idea on the best way to
>> hint to GCC that a given insn would make more sense someplace else.
>>
>> The C code is simple:
>> int16_t mask(uint32_t a)
>> {
>>    return (x & a) == a;
>> }
>>
>> int16_t is QImode and uint32_t is HImode.
>> After combine the insn chain (which is unmodified all the way to ira) is (in
>> simplified form):
>> regQI 27 <- regQI AH [a]
>> regQI 28 <- regQI AL [a+1]
>> regQI AL <- andQI(regQI 28, memQI(symbolrefQI(x) + 1))
>> regQI AH <- andQI(regQI 27, memQI(symbolrefQI(x))
>> regQI 30 <- regQI AL
>> regQI 29 <- regQI AH
>> regQI 24 <- 1
>> if regQI 29 != regQI 27
>>   goto labelref 20
>> if regQI 30 != regQI 28
>>   goto labelref 20
>> goto labelref 22
>> labelref 20
>> regQI 24 <- 0
>> labelref 22
>> regQI AL <- regQI 24
>>
>> The problem resides in `regQI 24 <- 1' being before the jumps.
>> Since regQI 24 is going to AL, IRA decides to allocate regQI 24 to AL, which
>> creates loads of conflicts and reloads. If that same insn would be moved to
>> after the jumps and before the `goto labelref 22' then all would be fine
>> cause by then regs 27, 28, 29, 30 are dead.
>>
>> It's obviously hard to point to a solution but I was wondering if there's a
>> way to hint to GCC that moving an insn might help the code issue. Or if I
>> should look into a why an existing pass is not already doing that.
>
> On x86 we expand the code to ((xl & al) ^ al) | ((xh & ah) ^ ah) == 0
> which is then if-converted.  Modified testcase:
>
> long long x;
> _Bool __attribute__((regparm(2))) mask (long long a)
> {
>  return (x & a) == a;
> }
>
> on i?86 gets you
>
> mask:
> .LFB0:
>        .cfi_startproc
>        pushl   %ebx
>        .cfi_def_cfa_offset 8
>        .cfi_offset 3, -8
>        movl    %eax, %ebx
>        andl    x, %ebx
>        movl    %edx, %ecx
>        andl    x+4, %ecx
>        xorl    %ebx, %eax
>        xorl    %ecx, %edx
>        orl     %edx, %eax
>        sete    %al
>        popl    %ebx
>        .cfi_restore 3
>        .cfi_def_cfa_offset 4
>        ret
>
> so I wonder if you should investigate why the xor variant doesn't trigger
> for you?  On i?86 if-conversion probably solves your specific issue,
> but I guess the initial expansion is where you could improve placement
> of the 1 (after all, the 0 is after the jumps).

Oh, and I wonder if/why IRA can/does not rematerialize the constant
instead of spilling it.  Might be a cost issue that it doesn't delay
allocating a reg for 1 as that is cheap to reload (is it?).

Richard.

> Richard.
>
>> Cheers,
>>
>> --
>> PMatos
>>
>>
>

Reply via email to