https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111143
--- Comment #5 from Paul Eggert <eggert at cs dot ucla.edu> --- (In reply to Alexander Monakov from comment #4) > To evaluate scheduling aspect, keep 'mov eax, 1' while changing 'add rbx, > rax' to 'add rbx, 1'. Adding the (unnecessary) 'mov eax, 1' doesn't affect the timing much, which is what I would expect on a newer processor. When I reran the benchmark on the same laptop (Intel i5-1335U), I got 3.289s for GCC-generated code, 2.256s for the "38% faster" code (now it's 46% faster; don't know why) and 2.260 s for the faster code with the unnecessary 'mov eax, 1' inserted.