https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101579
--- Comment #5 from Hongtao.liu <crazylht at gmail dot com> --- After manually eliminating the upper part vector shuffle, codegen is much better https://godbolt.org/z/d3YhzzYfo typedef unsigned int __attribute__((__vector_size__ (32))) U; typedef unsigned char __attribute__((__vector_size__ (32))) V; V g; U foo (void) { V v = __builtin_shufflevector (g, g, 0, 1, 2, 0, 5, 1, 0, 1, 3, 2, 3, 0, 4, 3, 1, 2, 2, 0, 4, 2, 3, 1, 1, 2, 3, 4, 1, 1, 0, 0, 5, 2) ; v ^= 255; V w = v + g; U u = ((union { V a; U b; }) w).b + ((union { V a; U b; }) w).b[1]; return u; }