https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111143
--- Comment #7 from Paul Eggert <eggert at cs dot ucla.edu> --- (In reply to Alexander Monakov from comment #6) > Are you binding the benchmark to some core in particular? I did the benchmark on performance cores, which was my original use case. On efficiency cores, adding the (unnecessary) 'mov eax, 1' doesn't change timing much (0.9% speedup on one test). > it is better to have 'add rbx, 1' instead of 'add rbx, rax' in this loop on > any CPU Somewhat counterintuitively, that doesn't seem to be the case for the efficiency cores on this platform, as the "38% faster" code is 7% slower on E-cores. However, the use cases I'm concerned about are typically run on performance cores.