On Mon, Jul 6, 2015 at 7:30 AM, Vineet Gupta <vineet.gup...@synopsys.com> wrote: > On Friday 03 July 2015 07:15 PM, Richard Biener wrote: >> On Fri, Jul 3, 2015 at 3:10 PM, Vineet Gupta <vineet.gup...@synopsys.com> >> wrote: >>> Hi, >>> >>> I have the following test case (reduced from Linux kernel sources) and it >>> seems >>> gcc is optimizing away the first loop iteration. >>> >>> arc-linux-gcc -c -O2 star-9000857057.c -fno-branch-count-reg --save-temps >>> -mA7 >>> >>> ----------->8------------- >>> static inline int __test_bit(unsigned int nr, const volatile unsigned long >>> *addr) >>> { >>> unsigned long mask; >>> >>> addr += nr >> 5; >>> #if 0 >>> nr &= 0x1f; >>> #endif >>> mask = 1UL << nr; >>> return ((mask & *addr) != 0); >>> } >>> >>> int foo (int a, unsigned long *p) >>> { >>> int i; >>> for (i = 63; i>=0; i--) >>> { >>> if (!(__test_bit(i, p))) >>> continue; >>> a += i; >>> } >>> return a; >>> } >>> ----------->8------------- >>> >>> gcc generates following >>> >>> ----------->8------------- >>> .global foo >>> .type foo, @function >>> foo: >>> ld_s r2,[r1,4] <---- dead code >>> mov_s r2,63 >>> .align 4 >>> .L2: >>> sub r2,r2,1 <-----SUB first >>> cmp r2,-1 >>> jeq.d [blink] >>> lsr r3,r2,5 <----- BUG: first @mask is (1 << 62) NOT (1 << 63) >>> .align 2 >>> .L4: >>> ld.as r3,[r1,r3] >>> bbit0.nd r3,r2,@.L2 >>> add_s r0,r0,r2 >>> sub r2,r2,1 >>> cmp r2,-1 >>> bne.d @.L4 >>> lsr r3,r2,5 >>> j_s [blink] >>> .size foo, .-foo >>> .ident "GCC: (ARCv2 ISA Linux uClibc toolchain >>> arc-2015.06-rc1-21-g21b2c4b83dfa) >>> 4.8.4" >>> ----------->8------------- >>> >>> For initial 32 loop operations, this test is effectively doing 64 bit >>> operation, >>> e.g. (1 << 63) in 32 bit regime. Is this supposed to be undefined, >>> truncated to >>> zero or port specific. >>> >>> If it is truncate to zero then generated code below is not correct as it >>> needs to >>> elide not just the first iteration (corresponding to i = 63) but 63..32 >>> >>> Further ARCompact ISA provides that instructions involving bitpos operands >>> BSET, >>> BCLR, LSL can any number whatsoever, but core will only use the lower 5 >>> bits (so >>> clamping the bitpos to 0..31 w/o need for doing that in code. >>> >>> So is this a gcc bug, or some spec misinterpretation,..... >> It is the C language standard that says that shifts like this invoke >> undefined behavior. > > Right, but the compiler is a program nevertheless and it knows what to do > when it > sees 1 << 62 > It's not like there is an uninitialized variable or something which will > provide > unexpected behaviour. > More importantly, the question is can ports define a specific behaviour for > such > cases and whether that would be sufficient to guarantee the semantics. > > The point being ARC ISA provides a neat feature where core only considers > lower 5 > bits of bitpos operands. Thus we can make such behaviour not only > deterministic in > the context of ARC, but also optimal, eliding the need for doing specific > masking/clamping to 5 bits.
There is SHIFT_COUNT_TRUNCATED which allows you to combine b & 31 with the shift value if you instead write a << (b & 31). Of course a << 63 is still undefined behavior regardless of target behavior. Richard. > -Vineet