14 Regression] Architecture-dependent missed optimizations for double swapping

janschultke at googlemail dot com via Gcc-bugs Sat, 05 Aug 2023 15:44:14 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110916


            Bug ID: 110916
           Summary: [12/13/14 Regression] Architecture-dependent missed
                    optimizations for double swapping
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: janschultke at googlemail dot com
  Target Milestone: ---

GCC's ability to eliminate redundant stores and loads is oddly dependent on the
architecture. Even on the same overall arch, compiling for Skylake in
particular performs always performs best.

On x86_64 -march=x86-64-v2, GCC 11 provides the optimal output. GCC 12/13/14
provide suboptimal output compared to -march=skylake.

On ARM64, a strange load and store from/to the same register is emitted. This
is the case for all version of GCC available on Compiler Explorer.

## Code to Reproduce (https://godbolt.org/z/d7Kcdn8fo)

static void swap(int* restrict a, int* restrict b) {
    const int tmp = *a;
    *a = *b;
    *b = tmp;
}

void double_swap_alias(int* a, int* b) {
    swap(a, b);
    swap(a, b);
}

## Expected Output (x86_64 GCC 14 -O3 -march=skylake)

ret


## Actual Output (x86_64 GCC 14 -O3 -march=x86-64-v2)

mov     edx, DWORD PTR [rsi]
mov     eax, DWORD PTR [rdi]
mov     DWORD PTR [rdi], edx
mov     DWORD PTR [rsi], eax
mov     edx, DWORD PTR [rdi]
mov     DWORD PTR [rdi], eax
mov     DWORD PTR [rsi], edx
ret


## Actual Output (ARM64 GCC 14 -O3)

ldr     w0, [x1]
str     w0, [x1]
ret

[Bug c++/110916] New: [12/13/14 Regression] Architecture-dependent missed optimizations for double swapping

Reply via email to