On 06.01.2026 22:07, Andrew Cooper wrote: > On 13/10/2025 2:06 pm, Jan Beulich wrote: >> Along with Zen2 (which doesn't expose ERMS), both families reportedly >> suffer from sub-optimal aliasing detection when deciding whether REP MOVSB >> can actually be carried out the accelerated way. Therefore we want to >> avoid its use in the common case of memcpy(); copy_page_hot() is fine, as >> its two pointers are always going to be having the same low 5 bits. > > I think this could be a bit clearer. How about this: > > ---8<--- > Zen2 (which doesn't expose ERMS) through Zen4 have sub-optimal aliasing > detection for REP MOVS, and fall back to a unit-at-a-time loop when the > two pointers have differing bottom 5 bits. While both forms are > affected, this makes REP MOVSB 8 times slower than REP MOVSQ. > > memcpy() has a high likelihood of encountering this slowpath, so avoid > using REP MOVSB. This undoes the ERMS optimisation added in commit > d6397bd0e11c which turns out to be an anti-optimisation on these > microarchitectures. > > However, retain the use of ERMS-based REP MOVSB in other cases such as > copy_page_hot() where there parameter alignment is known to avoid the > slowpath. > ---8<--- > > ?
Fine with me; changed. Do I take this as an okay-to-commit? Jan
