https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96239
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Last reconfirmed| |2020-07-20 Blocks| |53947 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- I see at -O2 Coalescing successful! Merged into 1 stores 16 bit bswap implementation found at: _9 New sequence of 1 stores to replace old one of 2 stores _9 = from_2(D) r>> 8; MEM[(union BytesOverlay *)&overlay] = _9; and movl %edi, %eax rolw $8, %ax ret with -O3 the vectorizer gets in the way and elides overlay: _7 = (unsigned char) from_2(D); _8 = BIT_FIELD_REF <from_2(D), 8, 8>; _9 = {_8, _7}; _3 = VIEW_CONVERT_EXPR<short unsigned int>(_9); overlay ={v} {CLOBBER}; return _3; now we could fold the V_C_E of the vector ctor to a bswap on from_2 though that would be quite a big special pattern. Maybe vector-ctor folding can consider this case to convert it to a V_C_E to a vector type from the bswap result. OTOH "fixing" the vectorizer to emit a bswap would be even nicer. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations