https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62128
Bug ID: 62128 Summary: Use vpalignr for AVX2 rotation Product: gcc Version: 5.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: glisse at gcc dot gnu.org Target: x86_64-linux-gnu typedef unsigned char vec __attribute__((vector_size(32))); vec f(vec x){ vec m={1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,0}; return __builtin_shuffle(x,m); } We generate, with -O3 -mavx2: vpshufb .LC0(%rip), %ymm0, %ymm1 vpshufb .LC1(%rip), %ymm0, %ymm0 vpermq $78, %ymm1, %ymm1 vpor %ymm1, %ymm0, %ymm0 But unless I am mistaken, a lane swap and vpalignr should do it in 2 instructions and without reading constants from memory. There is a function expand_vec_perm_palignr but it only handles some 128 bit cases. Even for permutations that can be done with a single 256 bit vpalignr instruction, we never seem to generate it.