http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60145
Bug ID: 60145
Summary: [AVR] Suboptimal code for byte order shuffling using
shift and or
Product: gcc
Version: 4.8.2
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: other
Assignee: unassigned at gcc dot gnu.org
Reporter: matthijs at stdin dot nl
(Not sure what the component should be, just selected "other" for now)
Using shifts and bitwise-or to compose multiple bytes into a bigger integer
results in suboptimal code on AVR.
For example, a few simple functions that take two or four bytes and
compose them into (big endian) integers. Since AVR is an 8-bit platform,
this essentially just means moving two bytes from the argument register
to the return value registers. However, the outputted assembly is
significantly bigger than that and contains obvious optimization
opportunities.
The example below also contains a version that uses a union to compose the
integer, which gets optimized as expected (but only works on little-endian
systems, since it relies on the native endianness of uint16_t).
matthijs@grubby:~$ cat foo.c
#include <stdint.h>
uint16_t join2(uint8_t a, uint8_t b) {
return ((uint16_t)a << 8) | b;
}
uint16_t join2_efficient(uint8_t a, uint8_t b) {
union {
uint16_t uint;
uint8_t arr[2];
} tmp = {.arr = {b, a}};
return tmp.uint;
}
uint32_t join4(uint8_t a, uint8_t b, uint8_t c, uint8_t d) {
return ((uint32_t)a << 24) | ((uint32_t)b << 16) | ((uint32_t)c << 8) |
d;
}
matthijs@grubby:~$ avr-gcc -c foo.c -O3 && avr-objdump -d foo.o
foo.o: file format elf32-avr
Disassembly of section .text:
00000000 <join2>:
0: 70 e0 ldi r23, 0x00 ; 0
2: 26 2f mov r18, r22
4: 37 2f mov r19, r23
6: 38 2b or r19, r24
8: 82 2f mov r24, r18
a: 93 2f mov r25, r19
c: 08 95 ret
0000000e <join2_efficient>:
e: 98 2f mov r25, r24
10: 86 2f mov r24, r22
12: 08 95 ret
00000014 <join4>:
14: 0f 93 push r16
16: 1f 93 push r17
18: 02 2f mov r16, r18
1a: 10 e0 ldi r17, 0x00 ; 0
1c: 20 e0 ldi r18, 0x00 ; 0
1e: 30 e0 ldi r19, 0x00 ; 0
20: 14 2b or r17, r20
22: 26 2b or r18, r22
24: 38 2b or r19, r24
26: 93 2f mov r25, r19
28: 82 2f mov r24, r18
2a: 71 2f mov r23, r17
2c: 60 2f mov r22, r16
2e: 1f 91 pop r17
30: 0f 91 pop r16
32: 08 95 ret