https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100622
Bug ID: 100622 Summary: Conversion to smaller unsigned type in loop Product: gcc Version: unknown Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: tkoenig at gcc dot gnu.org Target Milestone: --- Consider unsigned int foo(unsigned int *a, int n) { int i; unsigned int res = 0; for (i=0; i<n; i++) res += a[i]; return res; } unsigned int foo2 (unsigned int *a, int n) { int i; unsigned long res = 0; for (i=0; i<n; i++) res += a[i]; return res; } Given modular 2^n arithmetic, these two functions are identical in effect. On POWER with a reasonably recent trunk with -O1, this gets compiled to (using -O1 in order to avoid loop unrolling for better visibility) $ gcc -O1 -c add.c && objdump --disassemble add.o add.o: file format elf64-powerpcle Disassembly of section .text: 0000000000000000 <foo>: 0: 00 00 04 2c cmpwi r4,0 4: 30 00 81 40 ble 34 <foo+0x34> 8: 20 00 89 78 clrldi r9,r4,32 c: fc ff 43 39 addi r10,r3,-4 10: 00 00 60 38 li r3,0 14: a6 03 29 7d mtctr r9 18: 04 00 0a 85 lwzu r8,4(r10) 1c: 14 1a 68 7c add r3,r8,r3 20: 20 00 63 78 clrldi r3,r3,32 24: ff ff 29 39 addi r9,r9,-1 28: 20 00 29 79 clrldi r9,r9,32 2c: ec ff 00 42 bdnz 18 <foo+0x18> 30: 20 00 80 4e blr 34: 00 00 60 38 li r3,0 38: 20 00 80 4e blr ... 0000000000000048 <foo2>: 48: 00 00 04 2c cmpwi r4,0 4c: 30 00 81 40 ble 7c <foo2+0x34> 50: 20 00 89 78 clrldi r9,r4,32 54: fc ff 43 39 addi r10,r3,-4 58: 00 00 60 38 li r3,0 5c: a6 03 29 7d mtctr r9 60: 04 00 0a 85 lwzu r8,4(r10) 64: 14 42 63 7c add r3,r3,r8 68: ff ff 29 39 addi r9,r9,-1 6c: 20 00 29 79 clrldi r9,r9,32 70: f0 ff 00 42 bdnz 60 <foo2+0x18> 74: 20 00 63 78 clrldi r3,r3,32 78: 20 00 80 4e blr 7c: 00 00 60 38 li r3,0 80: f4 ff ff 4b b 74 <foo2+0x2c> so there is an extra instruction to mask the result of the addition in foo. This should not be needed.