On 13/10/2025 2:06 pm, Jan Beulich wrote:
> Along with Zen2 (which doesn't expose ERMS), both families reportedly
> suffer from sub-optimal aliasing detection when deciding whether REP MOVSB
> can actually be carried out the accelerated way. Therefore we want to
> avoid its use in the common case of memcpy(); copy_page_hot() is fine, as
> its two pointers are always going to be having the same low 5 bits.

I think this could be a bit clearer.  How about this:

---8<---
Zen2 (which doesn't expose ERMS) through Zen4 have sub-optimal aliasing
detection for REP MOVS, and fall back to a unit-at-a-time loop when the
two pointers have differing bottom 5 bits.  While both forms are
affected, this makes REP MOVSB 8 times slower than REP MOVSQ.

memcpy() has a high likelihood of encountering this slowpath, so avoid
using REP MOVSB.  This undoes the ERMS optimisation added in commit
d6397bd0e11c which turns out to be an anti-optimisation on these
microarchitectures.

However, retain the use of ERMS-based REP MOVSB in other cases such as
copy_page_hot() where there parameter alignment is known to avoid the
slowpath.
---8<---

?

This at least gets us back to the 4.20 behaviour.

~Andrew

Reply via email to