https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65375
--- Comment #10 from Jim Wilson <wilson at gcc dot gnu.org> --- Improved, but not completely resolved. We still get unnecessary orr instructions, same as in comment 2. This is partly an issue with the register allocator not handling partially overlapping register reads/writes very well. We already have a few other bugs for that. This is also partly an issue with how the aarch64 builtins work, via __builtin_aarch64_[gs]et_qregoiv4sf which create the partially overlapping register reads/writes. The ARM builtins don't work this way, they use a union for type punning, and hence don't have the same problem.