On 14/01/14 14:25, Richard Biener wrote:
> On Tue, Jan 14, 2014 at 3:21 PM, Kyrill Tkachov <[email protected]>
> wrote:
>> Moving to gcc, I accidentally sent it to gcc-patches previously...
>>
>>
>> On 14/01/14 14:09, Richard Biener wrote:
>>>
>>> On Tue, Jan 14, 2014 at 3:03 PM, Kyrill Tkachov <[email protected]>
>>> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I'm looking into PR 54168 where we end up generating an unnecessary
>>>> extend
>>>> operation on arm.
>>>>
>>>> Given code:
>>>>
>>>> struct b2Body {
>>>> unsigned short flags;
>>>> int type;
>>>> };
>>>>
>>>> static _Bool IsAwake(struct b2Body *b)
>>>> {
>>>> return (b->flags & 2) == 2;
>>>> }
>>>>
>>>>
>>>> int foo(struct b2Body *bA, struct b2Body *bB)
>>>> {
>>>> int typeA = bA->type;
>>>> int typeB = bB->type;
>>>> _Bool activeA = IsAwake(bA) && typeA != 0;
>>>> _Bool activeB = IsAwake(bB) && typeB != 0;
>>>>
>>>> if (!activeA && !activeB)
>>>> {
>>>> return 1;
>>>> }
>>>>
>>>> return 0;
>>>> }
>>>>
>>>> Compiled for arm-none-eabi with -O3 -march=armv7-a
>>>>
>>>> The inlined generated code for IsAwake contains the fragment:
>>>>
>>>> ldrh r0, [r1]
>>>> and r0, r0, #2
>>>> uxth r0, r0
>>>> cmp r0, #0
>>>>
>>>> which contains a redundant extend, which also confuses combine and
>>>> prevents
>>>> the whole thing from being optimised into an ldrh ; ands sequence.
>>>>
>>>> Looking at the tree dumps I notice that after the forwprop pass the types
>>>> of
>>>> the operands in the _7 = _3 & 2; statement turn into short unsigned where
>>>> before that pass they were just ints:
>>>>
>>>> IsAwake (struct b2Body * b)
>>>> {
>>>> short unsigned int _3;
>>>> int _4;
>>>> _Bool _6;
>>>> short unsigned int _7;
>>>>
>>>> <bb 2>:
>>>> _3 = b_2(D)->flags;
>>>> _4 = (int) _3;
>>>> _7 = _3 & 2;
>>>> _6 = _7 != 0;
>>>> return _6;
>>>>
>>>> }
>>>>
>>>>
>>>> I believe the C standard expects the operation to be performed in int
>>>> mode.
>>>> Now, since this is a bitwise and operation with a known constant 2, the
>>>> operation can be safely performed in unsigned short mode. However on
>>>> word-based machines like arm this would introduce unnecessary extend
>>>> operations down the line, as I believe is the case here.
>>>
>>> Though the variant using shorts is clearly shorter (in number of stmts)
>>> and thus easier to optimize. Am I correct that the issue in the end
>>> is that _7 != 0 requires to extend _7? & 2 is trivially performed without
>>> any extension, no?
>>
>>
>> If I look at the dump before forwprop, the number of statements is exactly
>> the same, so it's not any shorter in that sense.
>
> Well, it is - _4 = (int) _3; is dead, thus a zero-extension instruction
> is removed.
>
That's a rather short-sighted definition of removed. You've removed an
extension that:
a) can be merged with the preceding load
b) will have to be put back again anyway, when the _4 & 2 is expanded.
And finally, you end up with a second one when _5 != 0 is later expanded.
R.
>> <Dump from before forwprop>
>>
>> IsAwake (struct b2Body * b)
>> {
>> short unsigned int _3;
>> int _4;
>> int _5;
>> _Bool _6;
>>
>>
>> <bb 2>:
>> _3 = b_2(D)->flags;
>> _4 = (int) _3;
>> _5 = _4 & 2;
>> _6 = _5 != 0;
>> return _6;
>>
>> }
>>
>> Using shorts is not cheaper on an architecture like arm which is word-based.
>> Just the fact that we're introducing shorts already implies we're going to
>> have to extend somewhere.
>
> But given the bit-and with '2' the extension can be removed, no?
>
> Richard.
>
>> Kyrill
>>
>>
>>>
>>> Richard.
>>>
>>>> Anyone have any insight on how to resolve this one?
>>>>
>>>> Thanks,
>>>> Kyrill
>>>>
>>
>>
>