https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89057
Abhiraj Garakapati <abhiraj.garakapati at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |abhiraj.garakapati at gmail dot co | |m --- Comment #7 from Abhiraj Garakapati <abhiraj.garakapati at gmail dot com> --- This issue is observed during the RTL phase (test1.cpp.234r.expand i.e, during Gimple to RTL conversion.) with -O1 flag enabled. (This issue is seen in -O1, -O2, -O3 not in -O0.) All these below 3 Gimple instructions are converted to 2 move instructions each during Gimple to RTL conversion. This scenario is not seen in GCC-7.3.0 only seen from GCC-8.1.0 due to the patch: https://gcc.gnu.org/git/?p=gcc.git;a=patch;h=a977dc0c5e069bf198f78ed4767deac369904301 _68 = __builtin_aarch64_combinev8qi (_67, { 0, 0, 0, 0, 0, 0, 0, 0 }); _69 = __builtin_aarch64_combinev8qi (_66, { 0, 0, 0, 0, 0, 0, 0, 0 }); _70 = __builtin_aarch64_combinev8qi (_65, { 0, 0, 0, 0, 0, 0, 0, 0 }); This issue can be fixed by adding "-fno-move-loop-invariants" (as a workaround). This issue can be fixed on GCC-8.1.0 by reverting "aarch64-simd.md" file changes in the patch: https://gcc.gnu.org/git/?p=gcc.git;a=patch;h=a977dc0c5e069bf198f78ed4767deac369904301 Also, cross-checked the newly built toolchain with reverting "aarch64-simd.md" file changes with the above-mentioned test case and got the expected output same as GCC-7.3.0. With gcc 8.1 with reverting "aarch64-simd.md" file changes the inner loop is: .L5: ld3 {v4.8b-v6.8b}, [x1] add x1, x1, #0x18 mov v0.8b, v6.8b mov v1.8b, v5.8b mov v2.8b, v4.8b mov v3.16b, v7.16b st4 {v0.8b-v3.8b}, [x0] add x0, x0, 32 cmp x3, x0 bhi .L5 Also, cross-checked it with the below test case (which is mentioned in patch: https://gcc.gnu.org/git/?p=gcc.git;a=patch;h=a977dc0c5e069bf198f78ed4767deac369904301 this patch improves code generation for literal vector construction by expanding and exposing the pattern to RTL optimization earlier. The current implementation delays splitting the pattern until after reload which results in poor code generation for the following code) Test case to show patch improvement(https://gcc.gnu.org/git/?p=gcc.git;a=patch;h=a977dc0c5e069bf198f78ed4767deac369904301 ): #include "arm_neon.h" int16x8_t foo () { return vcombine_s16 (vdup_n_s16 (0), vdup_n_s16 (8)); } GCC_8.1.0 -O1 with reverting "aarch64-simd.md" file changes: foo(): adrp x0, 0 <_Z3foov> ldr q0, [x0] ret So, reverting the "aarch64-simd.md" file changes does not result in poor code generation. Also, cross-checked it with the latest GCC version GCC-10.2.0.