https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71056
Bug ID: 71056 Summary: __builtin_bswap32 NEON instruction error with -O3 Product: gcc Version: 6.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- The following code generate a NEON instruction not available error when compiling with `gcc -march=armv7-a -mfloat-abi=hard -mfpu=vfpv3-d16 -O3 -o /dev/null -c a.c` on ARM on gcc 6.1.1 (ArchLinuxARM). ```c #include <string.h> #include <stdint.h> extern char *buff; int f2(); struct T1 { int32_t reserved[2]; uint32_t ip; uint16_t cs; uint16_t rsrv2; }; void f3(const char *p) { struct T1 x; memcpy(&x, p, sizeof(struct T1)); x.reserved[0] = __builtin_bswap32(x.reserved[0]); x.reserved[1] = __builtin_bswap32(x.reserved[1]); x.ip = __builtin_bswap32(x.ip); x.cs = x.cs << 8 | x.cs >> 8; x.rsrv2 = x.rsrv2 << 8 | x.rsrv2 >> 8; if (f2()) { memcpy(buff, "\n", 1); } } ``` Error message ``` alarm% gcc -march=armv7-a -mfloat-abi=hard -mfpu=vfpv3-d16 -O3 -o /dev/null -c a.c a.c: In function ‘f3’: a.c:16:21: fatal error: You must enable NEON instructions (e.g. -mfloat-abi=softfp -mfpu=neon) to use these intrinsics. x.reserved[0] = __builtin_bswap32(x.reserved[0]); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ compilation terminated. ``` Note that `NEON` isn't enabled and there's no direct use of NEON instructions/intrinsics in the code so the NEON instructions must have been added by the optimizer. Seemingly subtle change can make the error disappear. This includes. 1. -O3 -> -O2 (ok, this one is not particularly subtle) 2. Remove any of the byteswap or field 3. Remove any of the memcpy 4. Make the second memcpy unconditional 5. Remove `f2()` (but leave the memcpy condition in some other way) 6. Pass in `x` as argument (either as value or pointer) The asm generated when compiling with `fpu=neon` ``` f3: @ args = 0, pretend = 0, frame = 16 @ frame_needed = 0, uses_anonymous_args = 0 mov r3, r0 str lr, [sp, #-4]! sub sp, sp, #20 ldr r2, [r3, #8] @ unaligned ldr r1, [r3, #4] @ unaligned mov ip, sp ldr r0, [r0] @ unaligned ldr r3, [r3, #12] @ unaligned stmia ip!, {r0, r1, r2, r3} mov r3, r2 ldrh ip, [sp, #12] rev r3, r3 ldrh r0, [sp, #14] vldr d16, [sp] lsr r1, ip, #8 str r3, [sp, #8] vrev32.8 d16, d16 lsr r2, r0, #8 orr r2, r2, r0, lsl #8 orr r1, r1, ip, lsl #8 strh r2, [sp, #14] @ movhi strh r1, [sp, #12] @ movhi vstr d16, [sp] bl f2 cmp r0, #0 movwne r3, #:lower16:buff movtne r3, #:upper16:buff movne r2, #10 ldrne r3, [r3] strbne r2, [r3] add sp, sp, #20 @ sp needed ldr pc, [sp], #4 .size f3, .-f3 .ident "GCC: (GNU) 6.1.1 20160501" ``` And it seems that the NEON instruction it want to generate is `vrev32.8` The case is simplified from https://github.com/llvm-mirror/llvm/blob/da4b82ab1387da8c959a4e2439bce10b9cefbc8a/tools/llvm-objdump/MachODump.cpp#L8240-L8263 I don't remember seeing this on gcc 5.