On Mon, Jul 6, 2015 at 7:30 AM, Vineet Gupta <vineet.gup...@synopsys.com> wrote:
> On Friday 03 July 2015 07:15 PM, Richard Biener wrote:
>> On Fri, Jul 3, 2015 at 3:10 PM, Vineet Gupta <vineet.gup...@synopsys.com> 
>> wrote:
>>> Hi,
>>>
>>> I have the following test case (reduced from Linux kernel sources) and it 
>>> seems
>>> gcc is optimizing away the first loop iteration.
>>>
>>> arc-linux-gcc -c -O2 star-9000857057.c -fno-branch-count-reg --save-temps 
>>> -mA7
>>>
>>> ----------->8-------------
>>> static inline int __test_bit(unsigned int nr, const volatile unsigned long 
>>> *addr)
>>> {
>>>  unsigned long mask;
>>>
>>>  addr += nr >> 5;
>>> #if 0
>>>     nr &= 0x1f;
>>> #endif
>>>  mask = 1UL << nr;
>>>  return ((mask & *addr) != 0);
>>> }
>>>
>>> int foo (int a, unsigned long *p)
>>> {
>>>   int i;
>>>   for (i = 63; i>=0; i--)
>>>   {
>>>       if (!(__test_bit(i, p)))
>>>            continue;
>>>       a += i;
>>>   }
>>>   return a;
>>> }
>>> ----------->8-------------
>>>
>>> gcc generates following
>>>
>>> ----------->8-------------
>>>         .global foo
>>>         .type   foo, @function
>>> foo:
>>>         ld_s r2,[r1,4]  <---- dead code
>>>         mov_s r2,63
>>>         .align 4
>>> .L2:
>>>         sub r2,r2,1    <-----SUB first
>>>         cmp r2,-1
>>>         jeq.d [blink]
>>>         lsr r3,r2,5   <----- BUG: first @mask is (1 << 62) NOT (1 << 63)
>>>         .align 2
>>> .L4:
>>>         ld.as r3,[r1,r3]
>>>         bbit0.nd r3,r2,@.L2
>>>         add_s r0,r0,r2
>>>         sub r2,r2,1
>>>         cmp r2,-1
>>>         bne.d @.L4
>>>         lsr r3,r2,5
>>>         j_s [blink]
>>>         .size   foo, .-foo
>>>         .ident  "GCC: (ARCv2 ISA Linux uClibc toolchain 
>>> arc-2015.06-rc1-21-g21b2c4b83dfa)
>>> 4.8.4"
>>> ----------->8-------------
>>>
>>> For initial 32 loop operations, this test is effectively doing 64 bit 
>>> operation,
>>> e.g. (1 << 63) in 32 bit regime. Is this supposed to be undefined, 
>>> truncated to
>>> zero or port specific.
>>>
>>> If it is truncate to zero then generated code below is not correct as it 
>>> needs to
>>> elide not just the first iteration (corresponding to i = 63) but 63..32
>>>
>>> Further ARCompact ISA provides that instructions involving bitpos operands 
>>> BSET,
>>> BCLR, LSL can any number whatsoever, but core will only use the lower 5 
>>> bits (so
>>> clamping the bitpos to 0..31 w/o need for doing that in code.
>>>
>>> So is this a gcc bug, or some spec misinterpretation,.....
>> It is the C language standard that says that shifts like this invoke
>> undefined behavior.
>
> Right, but the compiler is a program nevertheless and it knows what to do 
> when it
> sees 1 << 62
> It's not like there is an uninitialized variable or something which will 
> provide
> unexpected behaviour.
> More importantly, the question is can ports define a specific behaviour for 
> such
> cases and whether that would be sufficient to guarantee the semantics.
>
> The point being ARC ISA provides a neat feature where core only considers 
> lower 5
> bits of bitpos operands. Thus we can make such behaviour not only 
> deterministic in
> the context of ARC, but also optimal, eliding the need for doing specific
> masking/clamping to 5 bits.
There is SHIFT_COUNT_TRUNCATED which allows you to combine
b & 31 with the shift value if you instead write a << (b & 31).

Of course a << 63 is still undefined behavior regardless of target behavior.

Richard.

> -Vineet

Reply via email to