Hi,

I have the following test case (reduced from Linux kernel sources) and it seems
gcc is optimizing away the first loop iteration.

arc-linux-gcc -c -O2 star-9000857057.c -fno-branch-count-reg --save-temps -mA7

----------->8-------------
static inline int __test_bit(unsigned int nr, const volatile unsigned long 
*addr)
{
 unsigned long mask;

 addr += nr >> 5;
#if 0
    nr &= 0x1f;
#endif
 mask = 1UL << nr;
 return ((mask & *addr) != 0);
}

int foo (int a, unsigned long *p)
{
  int i;
  for (i = 63; i>=0; i--)
  {
      if (!(__test_bit(i, p)))
           continue;
      a += i;
  }
  return a;
}
----------->8-------------

gcc generates following

----------->8-------------
        .global foo
        .type   foo, @function
foo:
        ld_s r2,[r1,4]  <---- dead code
        mov_s r2,63     
        .align 4
.L2:
        sub r2,r2,1    <-----SUB first
        cmp r2,-1
        jeq.d [blink]
        lsr r3,r2,5   <----- BUG: first @mask is (1 << 62) NOT (1 << 63)
        .align 2
.L4:
        ld.as r3,[r1,r3]
        bbit0.nd r3,r2,@.L2
        add_s r0,r0,r2
        sub r2,r2,1
        cmp r2,-1
        bne.d @.L4
        lsr r3,r2,5
        j_s [blink]
        .size   foo, .-foo
        .ident  "GCC: (ARCv2 ISA Linux uClibc toolchain 
arc-2015.06-rc1-21-g21b2c4b83dfa)
4.8.4"
----------->8-------------

For initial 32 loop operations, this test is effectively doing 64 bit operation,
e.g. (1 << 63) in 32 bit regime. Is this supposed to be undefined, truncated to
zero or port specific.

If it is truncate to zero then generated code below is not correct as it needs 
to
elide not just the first iteration (corresponding to i = 63) but 63..32

Further ARCompact ISA provides that instructions involving bitpos operands BSET,
BCLR, LSL can any number whatsoever, but core will only use the lower 5 bits (so
clamping the bitpos to 0..31 w/o need for doing that in code.

So is this a gcc bug, or some spec misinterpretation,.....

TIA,
-Vineet

Reply via email to