https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97271
SRINATH PARVATHANENI <sripar01 at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution|--- |FIXED --- Comment #1 from SRINATH PARVATHANENI <sripar01 at gcc dot gnu.org> --- Committed to trunk: commit 377535881166969dba43794f298170978d797ef6 Author: Srinath Parvathaneni <srinath.parvathan...@arm.com> Date: Fri Oct 16 11:40:25 2020 +0100 arm: Fix wrong code generated for mve scatter store with writeback intrinsics with -O2 (PR97271). This patch fixes (PR97271) the wrong code-gen for mve scatter store with writeback intrinsics with -O2. $cat bug.c void foo (uint32x4_t * addr, const int offset, int32x4_t value) { vstrwq_scatter_base_wb_s32 (addr, 8, value); } $ arm-none-eabi-gcc bug.c -S -O2 -march=armv8.1-m.main+mve -mfloat-abi=hard -o - Without this patch: ... foo: vldrw.32 q3, [r0] vstrw.u32 q0, [q3, #8]! ---> (A) vldr.64 d4, .L3 vldr.64 d5, .L3+8 vldrw.32 q3, [r0] vstrw.u32 q2, [q3, #8]! ---> (B) bx lr ... With this patch: ... foo: vldrw.32 q3, [r0] vstrw.u32 q0, [q3, #8]! --> (C) vstrw.32 q3, [r0] bx lr ... Without this patch 2 vstrw assembly instructions (A and B) are generated for vstrwq_scatter_base_wb_s32 intrinsic where as fix generates only one vstrw assembly instruction (C). Committed to GCC-10: commit 4199cfa3d18eb99893456bd461061daa75115711 Author: Srinath Parvathaneni <srinath.parvathan...@arm.com> Date: Fri Oct 16 11:40:25 2020 +0100 arm: Fix wrong code generated for mve scatter store with writeback intrinsics with -O2 (PR97271). This patch fixes (PR97271) the wrong code-gen for mve scatter store with writeback intrinsics with -O2. $cat bug.c void foo (uint32x4_t * addr, const int offset, int32x4_t value) { vstrwq_scatter_base_wb_s32 (addr, 8, value); } $ arm-none-eabi-gcc bug.c -S -O2 -march=armv8.1-m.main+mve -mfloat-abi=hard -o - Without this patch: ... foo: vldrw.32 q3, [r0] vstrw.u32 q0, [q3, #8]! ---> (A) vldr.64 d4, .L3 vldr.64 d5, .L3+8 vldrw.32 q3, [r0] vstrw.u32 q2, [q3, #8]! ---> (B) bx lr ... With this patch: ... foo: vldrw.32 q3, [r0] vstrw.u32 q0, [q3, #8]! --> (C) vstrw.32 q3, [r0] bx lr ... Without this patch 2 vstrw assembly instructions (A and B) are generated for vstrwq_scatter_base_wb_s32 intrinsic where as fix generates only one vstrw assembly instruction (C).