https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92280
--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Hongtao.liu from comment #6) > (In reply to Richard Biener from comment #3) > > That said, VN already computes the partial loads to { 148, _142, _145, _139 > > } > > and would insert those CTORs in place of the loads, making the stores and > > the AVX512 CTOR dead. But that's obviously only profitable if the stores > > and the CTOR end up being dead, otherwise we risk doing redundant > > vector construction where cheap loads from memory would be possible. > > The alternative way expressing it via sub-vector extraction is similarly > > on the boundary of profitable plus we're happily simplifying that to a > > redundant CTOR. > > What about a rtl version pass_fre, after pass_expand it can be more certain > to eliminate partial reloads. Not sure what you are after - combine elides the loads as well but nothing on RTL then removes the dead store. There's no classical pass doing "CSE if this stmt becomes dead" which is what would be needed for optimality. There would be sth like SRA analyzing accesses (here to 'tmp') which could be used to either transform the AVX512 CTOR to { v1, v2, v3, v4 } with v1 = { _124, _143, _1245, _234 }, etc. (which incidentially is how we end up constructing such vector) or to split the store. Anyway, I am testing a patch.