https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65474
--- Comment #3 from wmi at google dot com --- Thanks. You are right. I wrote a microbenchmark (attached), and tested it on different intel microarchitectures. westmere: 1.gcc.out: 19.42 1.llvm.out: 19.32 sandybridge: 1.gcc.out: 18.61 1.llvm.out: 19.16 ivybridge: 1.gcc.out: 15.79 1.llvm.out: 15.87 On sandybridge, llvm's version was slower. On other microarchitectures, they were close to each other. So gcc's choose makes sense.