On Fri, Jan 10, 2025 at 5:58 PM Kees Cook <k...@kernel.org> wrote: > > On Thu, Jan 09, 2025 at 11:01:47PM +0100, Mateusz Guzik wrote: > > That is to say, contrary to the report above, I believe the change is > > in fact a regression which just so happened to make things faster for > > a specific case. The unintended speed up can be achieved without > > regressing anything else by taming the craziness. > > How do we best make sense of the perf report? Even in the iter case > above, it looks like a perf improvement? >
The kernel without your change compiled with gcc is leaving performance on the table in select cases, namely when it elects to use rep movsq for sizes below a magic threshold (depends on uarch). Your change has the unintended side effect of changing copy_page_from_iter_atomic to use plain memcpy, which justhappens to be the right thing to do for this particular consumer. However, it also has a side effect forcing of a memcpy call in places which were optimized just fine -- for example if there is a spot where there is a variable number of bytes to copy, but the range is small and the upper limit is also small, gcc will elect to emit few movs and be done with it, which is faster than calling memcpy. That is to say for spots like that this is a regression. In terms of optimizing all of this, the thing to do is to convince gcc to not emit rep movsq for known problematic cases. But also not mess with places which are optimized fine. -- Mateusz Guzik <mjguzik gmail.com>