https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71056

            Bug ID: 71056
           Summary: __builtin_bswap32 NEON instruction error with -O3
           Product: gcc
           Version: 6.1.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: yyc1992 at gmail dot com
  Target Milestone: ---

The following code generate a NEON instruction not available error when
compiling with `gcc -march=armv7-a -mfloat-abi=hard -mfpu=vfpv3-d16 -O3 -o
/dev/null -c a.c` on ARM on gcc 6.1.1 (ArchLinuxARM).

```c
#include <string.h>
#include <stdint.h>

extern char *buff;
int f2();
struct T1 {
    int32_t reserved[2];
    uint32_t ip;
    uint16_t cs;
    uint16_t rsrv2;
};
void f3(const char *p)
{
    struct T1 x;
    memcpy(&x, p, sizeof(struct T1));
    x.reserved[0] = __builtin_bswap32(x.reserved[0]);
    x.reserved[1] = __builtin_bswap32(x.reserved[1]);
    x.ip = __builtin_bswap32(x.ip);
    x.cs = x.cs << 8 | x.cs >> 8;
    x.rsrv2 = x.rsrv2 << 8 | x.rsrv2 >> 8;
    if (f2()) {
        memcpy(buff, "\n", 1);
    }
}
```

Error message

```
alarm% gcc -march=armv7-a -mfloat-abi=hard -mfpu=vfpv3-d16 -O3 -o /dev/null -c
a.c
a.c: In function ‘f3’:
a.c:16:21: fatal error: You must enable NEON instructions (e.g.
-mfloat-abi=softfp -mfpu=neon) to use these intrinsics.
     x.reserved[0] = __builtin_bswap32(x.reserved[0]);
                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
```

Note that `NEON` isn't enabled and there's no direct use of NEON
instructions/intrinsics in the code so the NEON instructions must have been
added by the optimizer.

Seemingly subtle change can make the error disappear. This includes.

1. -O3 -> -O2 (ok, this one is not particularly subtle)
2. Remove any of the byteswap or field
3. Remove any of the memcpy
4. Make the second memcpy unconditional
5. Remove `f2()` (but leave the memcpy condition in some other way)
6. Pass in `x` as argument (either as value or pointer)

The asm generated when compiling with `fpu=neon`

```
f3:
        @ args = 0, pretend = 0, frame = 16
        @ frame_needed = 0, uses_anonymous_args = 0
        mov     r3, r0
        str     lr, [sp, #-4]!
        sub     sp, sp, #20
        ldr     r2, [r3, #8]    @ unaligned
        ldr     r1, [r3, #4]    @ unaligned
        mov     ip, sp
        ldr     r0, [r0]        @ unaligned
        ldr     r3, [r3, #12]   @ unaligned
        stmia   ip!, {r0, r1, r2, r3}
        mov     r3, r2
        ldrh    ip, [sp, #12]
        rev     r3, r3
        ldrh    r0, [sp, #14]
        vldr    d16, [sp]
        lsr     r1, ip, #8
        str     r3, [sp, #8]
        vrev32.8        d16, d16
        lsr     r2, r0, #8
        orr     r2, r2, r0, lsl #8
        orr     r1, r1, ip, lsl #8
        strh    r2, [sp, #14]   @ movhi
        strh    r1, [sp, #12]   @ movhi
        vstr    d16, [sp]
        bl      f2
        cmp     r0, #0
        movwne  r3, #:lower16:buff
        movtne  r3, #:upper16:buff
        movne   r2, #10
        ldrne   r3, [r3]
        strbne  r2, [r3]
        add     sp, sp, #20
        @ sp needed
        ldr     pc, [sp], #4
        .size   f3, .-f3
        .ident  "GCC: (GNU) 6.1.1 20160501"
```

And it seems that the NEON instruction it want to generate is `vrev32.8`

The case is simplified from
https://github.com/llvm-mirror/llvm/blob/da4b82ab1387da8c959a4e2439bce10b9cefbc8a/tools/llvm-objdump/MachODump.cpp#L8240-L8263

I don't remember seeing this on gcc 5.

Reply via email to