9 Regression] AArch64 ld3 st4 less optimized

linux at carewolf dot com Wed, 30 Jan 2019 14:05:07 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89057


--- Comment #4 from Allan Jensen <linux at carewolf dot com> ---
While that change might have made things worse. The real problem is probably
that the registers for those instructions are loaded and stored using
intrinsics, so proper register allocation and combining cant be performed.

For ARMv7 for instance the same code can be optimized to having no moves but
just a single vswp instruction between ld3 and st4. And MSVC and clang can do
that but GCC can not.

[Bug target/89057] [8/9 Regression] AArch64 ld3 st4 less optimized

Reply via email to