https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121892

            Bug ID: 121892
           Summary: Optimize AVX2 VEC_CONVERT from short to char
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: mkretz at gcc dot gnu.org
  Target Milestone: ---
            Target: x86-64-*-*, i686-*-*

Test case (https://compiler-explorer.com/z/arn73Ps9f):

using From = short;
using To = char;
constexpr int N = 16;

using V0 [[gnu::vector_size(sizeof(From) * N)]] = From;
using V1 [[gnu::vector_size(sizeof(To) * N)]] = To;

V1 a(V0 x) { return __builtin_convertvector(x, V1); }

With -O2 -march=x86-64-v3, it compiles to:

        vpshufb ymm1, ymm0, YMMWORD PTR .LC0[rip]
        vpshufb ymm0, ymm0, YMMWORD PTR .LC1[rip]
        vpermq  ymm1, ymm1, 78
        vpor    ymm0, ymm0, ymm1

It can be optimized to:

        vpshufb ymm0, ymm0, YMMWORD PTR .LC0[rip]
        vpermq  ymm0, ymm0, 0xd8

With .LC0:
        .byte   0
        .byte   2
        .byte   4
        .byte   6
        .byte   8
        .byte   10
        .byte   12
        .byte   14
        .byte   -128
        .byte   -128
        .byte   -128
        .byte   -128
        .byte   -128
        .byte   -128
        .byte   -128
        .byte   -128
        .byte   0
        .byte   2
        .byte   4
        .byte   6
        .byte   8
        .byte   10
        .byte   12
        .byte   14
        .byte   -128
        .byte   -128
        .byte   -128
        .byte   -128
        .byte   -128
        .byte   -128
        .byte   -128
        .byte   -128

I.e. use pshufb to move the low bytes of each short to the lower 64 bits in
each 128-bit part. Then use permq to swap the inner 64-bit parts. The result
has the upper 128 bits zeroed already.

Reply via email to