https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110916
Bug ID: 110916 Summary: [12/13/14 Regression] Architecture-dependent missed optimizations for double swapping Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: janschultke at googlemail dot com Target Milestone: --- GCC's ability to eliminate redundant stores and loads is oddly dependent on the architecture. Even on the same overall arch, compiling for Skylake in particular performs always performs best. On x86_64 -march=x86-64-v2, GCC 11 provides the optimal output. GCC 12/13/14 provide suboptimal output compared to -march=skylake. On ARM64, a strange load and store from/to the same register is emitted. This is the case for all version of GCC available on Compiler Explorer. ## Code to Reproduce (https://godbolt.org/z/d7Kcdn8fo) static void swap(int* restrict a, int* restrict b) { const int tmp = *a; *a = *b; *b = tmp; } void double_swap_alias(int* a, int* b) { swap(a, b); swap(a, b); } ## Expected Output (x86_64 GCC 14 -O3 -march=skylake) ret ## Actual Output (x86_64 GCC 14 -O3 -march=x86-64-v2) mov edx, DWORD PTR [rsi] mov eax, DWORD PTR [rdi] mov DWORD PTR [rdi], edx mov DWORD PTR [rsi], eax mov edx, DWORD PTR [rdi] mov DWORD PTR [rdi], eax mov DWORD PTR [rsi], edx ret ## Actual Output (ARM64 GCC 14 -O3) ldr w0, [x1] str w0, [x1] ret