Much like the ZIP and UZP intrinsics, the vtrn[q]_* intrinsics are implemented with inline __asm__, which blocks compiler analysis. This series replaces those calls with __builtin_shuffle, which produce the same** assembler instructions.

** except for two-element vectors, where UZP, ZIP and TRN are all equivalent and the backend chooses to output ZIP.

The first patch adds a bunch of tests, passing for the current asm 
implementation;
the second patch reimplements with __builtin_shuffle;
the third patch adds equivalent ARM tests using test bodies shared from the first patch.

OK for stage 1?

Cheers, Alan

Reply via email to