https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92645
--- Comment #20 from Richard Biener <rguenth at gcc dot gnu.org> --- Small C testcase for one of the patterns we miss to optimize/vectorize: void foo (char * __restrict src, short * __restrict dest) { union { __int128_t i; char v[16]; } u; __builtin_memcpy (&u.i, src, 16); dest[0] = u.v[0]; dest[1] = u.v[1]; dest[2] = u.v[2]; dest[3] = u.v[3]; dest[4] = u.v[4]; dest[5] = u.v[5]; dest[6] = u.v[6]; dest[7] = u.v[7]; dest[8] = u.v[8]; dest[9] = u.v[9]; dest[10] = u.v[10]; dest[11] = u.v[11]; dest[12] = u.v[12]; dest[13] = u.v[13]; dest[14] = u.v[14]; dest[15] = u.v[15]; } presents itself as _19 = MEM <__int128 unsigned> [(char * {ref-all})src_18(D)]; _37 = (char) _19; _1 = (short int) _37; *dest_20(D) = _1; _38 = BIT_FIELD_REF <_19, 8, 8>; _2 = (short int) _38; MEM[(short int *)dest_20(D) + 2B] = _2; _39 = BIT_FIELD_REF <_19, 8, 16>; _3 = (short int) _39; MEM[(short int *)dest_20(D) + 4B] = _3; ... _16 = (short int) _52; MEM[(short int *)dest_20(D) + 30B] = _16; return; where SLP vectorization is confused about (char) _19 vs. BIT_FIELD_REF but also wouldn't handle BIT_FIELD_REFs. It neither vectorizes the store to a store from a CTOR which forwprop could then pattern-match.