https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88698

Alexander Monakov <amonakov at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |amonakov at gcc dot gnu.org

--- Comment #6 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
My recommendation is to use a union like below; this allows writing code using
both generic vectors and intrinsics without casts, and having each operation
show exactly what lane types it operates on:

typedef unsigned char  u8v  __attribute__((vector_size(16)));
typedef unsigned short u16v __attribute__((vector_size(16)));
typedef unsigned int   u32v __attribute__((vector_size(16)));

typedef union {
        u8v   u8;
        u16v  u16;
        u32v  u32;
        __m128i m;
} uv;

Example use:

        uv x, t, lo_nib, hi_nib;

        memcpy(&x, ptr, sizeof x);
        t.u32     = x.u32 >> 4;
        lo_nib.u8 = x.u8 & 15;
        hi_nib.u8 = t.u8 & 15;
        lo_nib.m  = _mm_shuffle_epi8(lut.m, lo_nib.m);
        hi_nib.m  = _mm_shuffle_epi8(lut.m, hi_nib.m);

This also allows writing 256-bit and 128-bit versions together when appropriate
(with help of extra macros for using the right intrinsic function).

Would you like to see the documentation mention this pattern?

Reply via email to