https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93165

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Alexander Monakov from comment #3)
> So perhaps an unpopular opinion, but I'd say a
> __builtin_branchless_select(c, a, b) (guaranteed to live throughout
> optimization pipeline as a non-branchy COND_EXPR) is badly missing.

I am going to say otherwise.  Many of the time conditional move is faster than
using a branch; even if the branch is predictable (there are a few exceptions)
on most non-Intel/AMD targets.  This is because the conditional move is just
one cycle and only a "predictable" branch is one cyle too.

It is even worse when doing things like:
if (a && b)
where on aarch64, this can be done using only one cmp followed by one ccmp.
NOTE on PowerPC, you could use in theory crand/cror (though this is not done
currently and I don't know if they are profitable in any recent design).

Plus aarch64 has conditional add and a few other things which improve the idea
of a conditional move.

Reply via email to