https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121892
Bug ID: 121892 Summary: Optimize AVX2 VEC_CONVERT from short to char Product: gcc Version: 16.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: mkretz at gcc dot gnu.org Target Milestone: --- Target: x86-64-*-*, i686-*-* Test case (https://compiler-explorer.com/z/arn73Ps9f): using From = short; using To = char; constexpr int N = 16; using V0 [[gnu::vector_size(sizeof(From) * N)]] = From; using V1 [[gnu::vector_size(sizeof(To) * N)]] = To; V1 a(V0 x) { return __builtin_convertvector(x, V1); } With -O2 -march=x86-64-v3, it compiles to: vpshufb ymm1, ymm0, YMMWORD PTR .LC0[rip] vpshufb ymm0, ymm0, YMMWORD PTR .LC1[rip] vpermq ymm1, ymm1, 78 vpor ymm0, ymm0, ymm1 It can be optimized to: vpshufb ymm0, ymm0, YMMWORD PTR .LC0[rip] vpermq ymm0, ymm0, 0xd8 With .LC0: .byte 0 .byte 2 .byte 4 .byte 6 .byte 8 .byte 10 .byte 12 .byte 14 .byte -128 .byte -128 .byte -128 .byte -128 .byte -128 .byte -128 .byte -128 .byte -128 .byte 0 .byte 2 .byte 4 .byte 6 .byte 8 .byte 10 .byte 12 .byte 14 .byte -128 .byte -128 .byte -128 .byte -128 .byte -128 .byte -128 .byte -128 .byte -128 I.e. use pshufb to move the low bytes of each short to the lower 64 bits in each 128-bit part. Then use permq to swap the inner 64-bit parts. The result has the upper 128 bits zeroed already.