https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50168
Aliaksei Kandratsenka changed:
What|Removed |Added
CC||alkondratenko at gmail dot com
--- Comment #10 from Aliaksei Kandratsenka ---
There is similar issue with bsr and __builtin_clz.
Looks like for __builtin_clz gcc does 31 - . And 31 - __builtin_clz
does gets compiled optimized to plain bsr, but only under --march=haswell or
later amd cpus.
Under earlier cpus it generates 2 redundant 31 - arg computations.
This is easy to play with at: https://godbolt.org/g/o7gNSS
Clang-en doesn't have that same problem (but they have another. Under
-march=haswell they sometimes too strongly prefer lzcnt which returns different
result and thus requires extra computation).