https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93165
--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> --- (In reply to Alexander Monakov from comment #3) > So perhaps an unpopular opinion, but I'd say a > __builtin_branchless_select(c, a, b) (guaranteed to live throughout > optimization pipeline as a non-branchy COND_EXPR) is badly missing. I am going to say otherwise. Many of the time conditional move is faster than using a branch; even if the branch is predictable (there are a few exceptions) on most non-Intel/AMD targets. This is because the conditional move is just one cycle and only a "predictable" branch is one cyle too. It is even worse when doing things like: if (a && b) where on aarch64, this can be done using only one cmp followed by one ccmp. NOTE on PowerPC, you could use in theory crand/cror (though this is not done currently and I don't know if they are profitable in any recent design). Plus aarch64 has conditional add and a few other things which improve the idea of a conditional move.