https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117173
--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> --- Please have a look at the i386 backend, where for constant permutes it tries a sequence of 1, 2, 3, 4 or even 5 instructions to do the various permutations. It isn't perfect and surely misses some cases that could be done more optimally, but starting in the backend with undoing the case you've filed this for would be useful. As Andrew wrote, user could have written it using __builtin_shuffle/__builtin_shufflevector using the combined permutation from the start.