The main purpose of this patch series is to fix a performance regression from GCC 8. Before the patch:
int64x2_t s64q_1(int64_t a0, int64_t a1) { if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) return (int64x2_t) { a1, a0 }; else return (int64x2_t) { a0, a1 }; } generated: fmov d0, x0 ins v0.d[1], x1 ins v0.d[1], x1 ret whereas GCC 8 generated the more respectable: dup v0.2d, x0 ins v0.d[1], x1 ret But there are some related knock-on changes that IMO are needed to keep things in a consistent and maintainable state. There is still more cleanup and optimisation that could be done in this area, but that's definitely GCC 13 material. Tested on aarch64-linux-gnu and aarch64_be-elf, pushed. Sorry for the size of the series, but it really did seem like the best fix in the circumstances. Richard