https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106323
Wilco <wilco at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |wilco at gcc dot gnu.org --- Comment #3 from Wilco <wilco at gcc dot gnu.org> --- (In reply to Andrew Pinski from comment #1) > GCC might be better if the first bytes are in cache but the next bytes are > not and then branch is predictable (which it might be). > > So this is much more complex than just changing this really. Neither sequence is efficient. Caches are not really relevant here, it's more about giving a wide OoO core lots of useful parallel work to do, so avoiding unnecessary instructions and branches that just slow you down. Hence 4 loads and CMP+CCMP is best.