https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65375
--- Comment #11 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> --- (In reply to Jim Wilson from comment #10) > Improved, but not completely resolved. We still get unnecessary orr > instructions, same as in comment 2. This is partly an issue with the > register allocator not handling partially overlapping register reads/writes > very well. We already have a few other bugs for that. This is also partly > an issue with how the aarch64 builtins work, via > __builtin_aarch64_[gs]et_qregoiv4sf which create the partially overlapping > register reads/writes. The ARM builtins don't work this way, they use a > union for type punning, and hence don't have the same problem. Both the ARM and the AArch64 ports have the issues with partially overlapping register reads / writes especially with the vzip / vuzip style intrinsics in AArch32 world or even the larger vld3/4 intrinsics in both ARM and AArch64 states. It would be nice to fix that finally. If that is the only issue left in the ticket - maybe we should just park this example in that ticket - IIRC PR43725 and close this one out ? regards Ramana