https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104151
--- Comment #16 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Barnabás Pőcze from comment #15) > Sorry, I haven't found a better issue. But I think the example below > exhibits the same or a very similar issue. > > I would expect the following code > > void f(unsigned char *p, std::uint32_t x, std::uint32_t y) > { > p[0] = x >> 24; > p[1] = x >> 16; > p[2] = x >> 8; > p[3] = x >> 0; > > p[4] = y >> 24; > p[5] = y >> 16; > p[6] = y >> 8; > p[7] = y >> 0; > } > > to be compiled to something along the lines of > > f(unsigned char*, unsigned int, unsigned int): > bswap esi > bswap edx > mov DWORD PTR [rdi], esi > mov DWORD PTR [rdi+4], edx > ret > > however, I get scores of bitwise operations instead if `-fno-tree-vectorize` > is not specified. > > https://gcc.godbolt.org/z/z51K6qorv Yes, here we vectorize the store: <bb 2> [local count: 1073741824]: _1 = x_15(D) >> 24; _2 = (unsigned char) _1; _3 = x_15(D) >> 16; _4 = (unsigned char) _3; _5 = x_15(D) >> 8; _6 = (unsigned char) _5; _7 = (unsigned char) x_15(D); _8 = y_22(D) >> 24; _9 = (unsigned char) _8; _10 = y_22(D) >> 16; _11 = (unsigned char) _10; _12 = y_22(D) >> 8; _13 = (unsigned char) _12; _14 = (unsigned char) y_22(D); _35 = {_2, _4, _6, _7, _9, _11, _13, _14}; vectp.4_36 = p_17(D); MEM <vector(8) unsigned char> [(unsigned char *)vectp.4_36] = _35; but without vectorizing the store merging pass (which comes after vectorization) is able to detect two SImode bswaps. Basically we fail to consider "generic" vectorization as option here and generic vectorization fails to consider using bswap for permutes of "existing vectors". Likewise we fail to consider _1, _3, etc. as element accesses of the existing "vectors" x and y. That would work iff the shift + truncates were canonicalized as BIT_FIELD_REF, but it's certainly possible to work with the existing IL here. Note this issue is probably better tracked in a separate bugreport.