https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100622

            Bug ID: 100622
           Summary: Conversion to smaller unsigned type in loop
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tkoenig at gcc dot gnu.org
  Target Milestone: ---

Consider

unsigned int foo(unsigned int *a, int n)
{
  int i;
  unsigned int res = 0;
  for (i=0; i<n; i++)
    res += a[i];

  return res;
}

unsigned int foo2 (unsigned int *a, int n)
{
  int i;
  unsigned long res = 0;
  for (i=0; i<n; i++)
    res += a[i];

  return res;
}

Given modular 2^n arithmetic, these two functions are identical in effect.

On POWER with a reasonably recent trunk with -O1, this gets compiled to
(using -O1 in order to avoid loop unrolling for better visibility)

$ gcc -O1 -c add.c && objdump --disassemble add.o

add.o:     file format elf64-powerpcle


Disassembly of section .text:

0000000000000000 <foo>:
   0:   00 00 04 2c     cmpwi   r4,0
   4:   30 00 81 40     ble     34 <foo+0x34>
   8:   20 00 89 78     clrldi  r9,r4,32
   c:   fc ff 43 39     addi    r10,r3,-4
  10:   00 00 60 38     li      r3,0
  14:   a6 03 29 7d     mtctr   r9
  18:   04 00 0a 85     lwzu    r8,4(r10)
  1c:   14 1a 68 7c     add     r3,r8,r3
  20:   20 00 63 78     clrldi  r3,r3,32
  24:   ff ff 29 39     addi    r9,r9,-1
  28:   20 00 29 79     clrldi  r9,r9,32
  2c:   ec ff 00 42     bdnz    18 <foo+0x18>
  30:   20 00 80 4e     blr
  34:   00 00 60 38     li      r3,0
  38:   20 00 80 4e     blr
        ...

0000000000000048 <foo2>:
  48:   00 00 04 2c     cmpwi   r4,0
  4c:   30 00 81 40     ble     7c <foo2+0x34>
  50:   20 00 89 78     clrldi  r9,r4,32
  54:   fc ff 43 39     addi    r10,r3,-4
  58:   00 00 60 38     li      r3,0
  5c:   a6 03 29 7d     mtctr   r9
  60:   04 00 0a 85     lwzu    r8,4(r10)
  64:   14 42 63 7c     add     r3,r3,r8
  68:   ff ff 29 39     addi    r9,r9,-1
  6c:   20 00 29 79     clrldi  r9,r9,32
  70:   f0 ff 00 42     bdnz    60 <foo2+0x18>
  74:   20 00 63 78     clrldi  r3,r3,32
  78:   20 00 80 4e     blr
  7c:   00 00 60 38     li      r3,0
  80:   f4 ff ff 4b     b       74 <foo2+0x2c>

so there is an extra instruction to mask the result of the
addition in foo.  This should not be needed.

Reply via email to