https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101579
--- Comment #4 from Hongtao.liu <crazylht at gmail dot com> --- (In reply to Jakub Jelinek from comment #2) > As > typedef unsigned char V __attribute__((vector_size (32))); > > V > foo (V x) > { > return __builtin_shufflevector (x, x, 0, 1, 2, 0, 5, 1, 0, 1, 3, 2, 3, 0, > 4, 3, 1, 2, > 2, 0, 4, 2, 3, 1, 1, 2, 3, 4, 1, 1, 0, > 0, 5, 2); > } > > V > bar (V x) > { > return __builtin_shufflevector (x, x, 0, 3, 3, 3, 3, 4, 5, 0, 1, 5, 2, 1, > 0, 1, 1, 2, > 3, 2, 0, 5, 4, 5, 1, 0, 1, 4, 4, 3, 4, > 5, 2, 0); > } > with -O2 -mavx2 is handled, I'd say this is veclower task to determine that > the particular permutation could be cheaply implemented with two > permutations of half-sized vectors and ask the backend if it supports those. > Of course there can be other permutations that can't be implemented that way > easily and might e.g. need more half-sized permutations... I looks to me that middle end should be able to transform 64-byte vector shuffle to 32-byte vector shuffle when data flow analysis shows the upper part of the vector is never used.