https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88698
Alexander Monakov <amonakov at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |amonakov at gcc dot gnu.org --- Comment #6 from Alexander Monakov <amonakov at gcc dot gnu.org> --- My recommendation is to use a union like below; this allows writing code using both generic vectors and intrinsics without casts, and having each operation show exactly what lane types it operates on: typedef unsigned char u8v __attribute__((vector_size(16))); typedef unsigned short u16v __attribute__((vector_size(16))); typedef unsigned int u32v __attribute__((vector_size(16))); typedef union { u8v u8; u16v u16; u32v u32; __m128i m; } uv; Example use: uv x, t, lo_nib, hi_nib; memcpy(&x, ptr, sizeof x); t.u32 = x.u32 >> 4; lo_nib.u8 = x.u8 & 15; hi_nib.u8 = t.u8 & 15; lo_nib.m = _mm_shuffle_epi8(lut.m, lo_nib.m); hi_nib.m = _mm_shuffle_epi8(lut.m, hi_nib.m); This also allows writing 256-bit and 128-bit versions together when appropriate (with help of extra macros for using the right intrinsic function). Would you like to see the documentation mention this pattern?