On Wed, Aug 7, 2024 at 1:37 PM Alexander Monakov <amona...@ispras.ru> wrote: > > > On Wed, 7 Aug 2024, Richard Biener wrote: > > > > > This is probably to work around bugs in older compiler versions? If > > > > not I agree. > > > > > > This is deliberate hand-tuning to avoid a subtle issue: pshufb is not > > > macro-fused on Intel, so with propagation it is two uops early in the > > > CPU front-end. > > > > > > The "propagation" actually falls out of IRA/LRA decisions, and stopped > > > happening in gcc-14. I'm not sure if there were relevant RA changes. > > > In any case, this can potentially flip-flop in the future again. > > > > > > Considering the trunk gets this right, I think the next move is to > > > add a testcase for this, not a PR, correct? > > > > Well, merging the memory operand into the pshufb would be wrong - embedded > > memory ops are always considered aligned, no? > > In SSE yes, in AVX no. For search_line_ssse3 the asms help if it is compiled > with e.g. -march=sandybridge (i.e. for a CPU that has AVX but lacks AVX2): > then VEX-encoded SSE instructions accept misaligned memory, and we want to > prevent that here.
Ah, yeah - I think there's even existing bugreports that we're too happy to duplicate a memory operand even into multiple insns. Richard. > > Alexander