On Wed, 7 Aug 2024, Richard Biener wrote:

> > > This is probably to work around bugs in older compiler versions?  If
> > > not I agree.
> >
> > This is deliberate hand-tuning to avoid a subtle issue: pshufb is not
> > macro-fused on Intel, so with propagation it is two uops early in the
> > CPU front-end.
> >
> > The "propagation" actually falls out of IRA/LRA decisions, and stopped
> > happening in gcc-14. I'm not sure if there were relevant RA changes.
> > In any case, this can potentially flip-flop in the future again.
> >
> > Considering the trunk gets this right, I think the next move is to
> > add a testcase for this, not a PR, correct?
> 
> Well, merging the memory operand into the pshufb would be wrong - embedded
> memory ops are always considered aligned, no?

In SSE yes, in AVX no. For search_line_ssse3 the asms help if it is compiled
with e.g. -march=sandybridge (i.e. for a CPU that has AVX but lacks AVX2):
then VEX-encoded SSE instructions accept misaligned memory, and we want to
prevent that here.

Alexander

Reply via email to