Hi, I have the following test case (reduced from Linux kernel sources) and it seems gcc is optimizing away the first loop iteration.
arc-linux-gcc -c -O2 star-9000857057.c -fno-branch-count-reg --save-temps -mA7 ----------->8------------- static inline int __test_bit(unsigned int nr, const volatile unsigned long *addr) { unsigned long mask; addr += nr >> 5; #if 0 nr &= 0x1f; #endif mask = 1UL << nr; return ((mask & *addr) != 0); } int foo (int a, unsigned long *p) { int i; for (i = 63; i>=0; i--) { if (!(__test_bit(i, p))) continue; a += i; } return a; } ----------->8------------- gcc generates following ----------->8------------- .global foo .type foo, @function foo: ld_s r2,[r1,4] <---- dead code mov_s r2,63 .align 4 .L2: sub r2,r2,1 <-----SUB first cmp r2,-1 jeq.d [blink] lsr r3,r2,5 <----- BUG: first @mask is (1 << 62) NOT (1 << 63) .align 2 .L4: ld.as r3,[r1,r3] bbit0.nd r3,r2,@.L2 add_s r0,r0,r2 sub r2,r2,1 cmp r2,-1 bne.d @.L4 lsr r3,r2,5 j_s [blink] .size foo, .-foo .ident "GCC: (ARCv2 ISA Linux uClibc toolchain arc-2015.06-rc1-21-g21b2c4b83dfa) 4.8.4" ----------->8------------- For initial 32 loop operations, this test is effectively doing 64 bit operation, e.g. (1 << 63) in 32 bit regime. Is this supposed to be undefined, truncated to zero or port specific. If it is truncate to zero then generated code below is not correct as it needs to elide not just the first iteration (corresponding to i = 63) but 63..32 Further ARCompact ISA provides that instructions involving bitpos operands BSET, BCLR, LSL can any number whatsoever, but core will only use the lower 5 bits (so clamping the bitpos to 0..31 w/o need for doing that in code. So is this a gcc bug, or some spec misinterpretation,..... TIA, -Vineet