Much like the ZIP and UZP intrinsics, the vtrn[q]_* intrinsics are implemented
with inline __asm__, which blocks compiler analysis. This series replaces those
calls with __builtin_shuffle, which produce the same** assembler instructions.
** except for two-element vectors, where UZP, ZIP and TRN are all equivalent and
the backend chooses to output ZIP.
The first patch adds a bunch of tests, passing for the current asm
implementation;
the second patch reimplements with __builtin_shuffle;
the third patch adds equivalent ARM tests using test bodies shared from the
first patch.
OK for stage 1?
Cheers, Alan