https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62128

            Bug ID: 62128
           Summary: Use vpalignr for AVX2 rotation
           Product: gcc
           Version: 5.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: enhancement
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: glisse at gcc dot gnu.org
            Target: x86_64-linux-gnu

typedef unsigned char vec __attribute__((vector_size(32)));
vec f(vec x){
  vec
m={1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,0};
  return __builtin_shuffle(x,m);
}

We generate, with -O3 -mavx2:

    vpshufb    .LC0(%rip), %ymm0, %ymm1
    vpshufb    .LC1(%rip), %ymm0, %ymm0
    vpermq    $78, %ymm1, %ymm1
    vpor    %ymm1, %ymm0, %ymm0

But unless I am mistaken, a lane swap and vpalignr should do it in 2
instructions and without reading constants from memory. There is a function
expand_vec_perm_palignr but it only handles some 128 bit cases. Even for
permutations that can be done with a single 256 bit vpalignr instruction, we
never seem to generate it.

Reply via email to