On 07/01/2026 7:22 am, Jan Beulich wrote: > On 06.01.2026 22:07, Andrew Cooper wrote: >> On 13/10/2025 2:06 pm, Jan Beulich wrote: >>> Along with Zen2 (which doesn't expose ERMS), both families reportedly >>> suffer from sub-optimal aliasing detection when deciding whether REP MOVSB >>> can actually be carried out the accelerated way. Therefore we want to >>> avoid its use in the common case of memcpy(); copy_page_hot() is fine, as >>> its two pointers are always going to be having the same low 5 bits. >> I think this could be a bit clearer. How about this: >> >> ---8<--- >> Zen2 (which doesn't expose ERMS) through Zen4 have sub-optimal aliasing >> detection for REP MOVS, and fall back to a unit-at-a-time loop when the >> two pointers have differing bottom 5 bits. While both forms are >> affected, this makes REP MOVSB 8 times slower than REP MOVSQ. >> >> memcpy() has a high likelihood of encountering this slowpath, so avoid >> using REP MOVSB. This undoes the ERMS optimisation added in commit >> d6397bd0e11c which turns out to be an anti-optimisation on these >> microarchitectures. >> >> However, retain the use of ERMS-based REP MOVSB in other cases such as >> copy_page_hot() where there parameter alignment is known to avoid the >> slowpath. >> ---8<--- >> >> ? > Fine with me; changed. Do I take this as an okay-to-commit?
Yeah - with something to this effect, Reviewed-by: Andrew Cooper <[email protected]> Sorry it took so long. ~Andrew
