https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111143

--- Comment #7 from Paul Eggert <eggert at cs dot ucla.edu> ---
(In reply to Alexander Monakov from comment #6)

> Are you binding the benchmark to some core in particular?

I did the benchmark on performance cores, which was my original use case. On
efficiency cores, adding the (unnecessary) 'mov eax, 1' doesn't change timing
much (0.9% speedup on one test).

> it is better to have 'add rbx, 1' instead of 'add rbx, rax' in this loop on 
> any CPU

Somewhat counterintuitively, that doesn't seem to be the case for the
efficiency cores on this platform, as the "38% faster" code is 7% slower on
E-cores. However, the use cases I'm concerned about are typically run on
performance cores.

Reply via email to