On 06.01.2026 22:07, Andrew Cooper wrote:
> On 13/10/2025 2:06 pm, Jan Beulich wrote:
>> Along with Zen2 (which doesn't expose ERMS), both families reportedly
>> suffer from sub-optimal aliasing detection when deciding whether REP MOVSB
>> can actually be carried out the accelerated way. Therefore we want to
>> avoid its use in the common case of memcpy(); copy_page_hot() is fine, as
>> its two pointers are always going to be having the same low 5 bits.
> 
> I think this could be a bit clearer.  How about this:
> 
> ---8<---
> Zen2 (which doesn't expose ERMS) through Zen4 have sub-optimal aliasing
> detection for REP MOVS, and fall back to a unit-at-a-time loop when the
> two pointers have differing bottom 5 bits.  While both forms are
> affected, this makes REP MOVSB 8 times slower than REP MOVSQ.
> 
> memcpy() has a high likelihood of encountering this slowpath, so avoid
> using REP MOVSB.  This undoes the ERMS optimisation added in commit
> d6397bd0e11c which turns out to be an anti-optimisation on these
> microarchitectures.
> 
> However, retain the use of ERMS-based REP MOVSB in other cases such as
> copy_page_hot() where there parameter alignment is known to avoid the
> slowpath.
> ---8<---
> 
> ?

Fine with me; changed. Do I take this as an okay-to-commit?

Jan

Reply via email to