https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89057

--- Comment #4 from Allan Jensen <linux at carewolf dot com> ---
While that change might have made things worse. The real problem is probably
that the registers for those instructions are loaded and stored using
intrinsics, so proper register allocation and combining cant be performed.

For ARMv7 for instance the same code can be optimized to having no moves but
just a single vswp instruction between ld3 and st4. And MSVC and clang can do
that but GCC can not.

Reply via email to