https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81914
--- Comment #3 from Daniel Fruzynski <bugzi...@poradnik-webmastera.com> --- Yes, branchless version is faster. Here are results for code compiled with gcc 4.8.5: Benchmark Time CPU Iterations -------------------------------------------------- BM_memcmp 6 ns 6 ns 111001949 BM_int64cmp 4 ns 4 ns 183761752 And here for gcc 7.1.0: Benchmark Time CPU Iterations -------------------------------------------------- BM_memcmp 6 ns 6 ns 113198940 BM_int64cmp 8 ns 8 ns 82036754 Note: Code tested accepted values passed via pointer instead of value, to better compare it with memcmp function. Functions were comparing random values, generated before tests and stored in arrays. srand was called with constant value to get repeatable results.