https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117266
--- Comment #12 from Andrew Pinski <pinskia at gcc dot gnu.org> --- (In reply to H. Peter Anvin from comment #6) > And THAT is exactly the point: *the two aren't equivalent.* Only the > programmer knows when this instruction is usable, and for performance > reasons, you *really, really* want to be able to use it when you as the > programmer know, a priori, that you can. Actually the compiler could know based on the ranges. And it could techincally optimize something like you gave for div2 into the instruction. Like say: ``` typedef unsigned _BitInt(64) uint64_t; uint64_t div2(uint64_t hi, uint64_t lo, uint64_t divisor) { unsigned _BitInt(128) dividend = ((unsigned _BitInt(128))hi << 64) | lo; unsigned _BitInt(128) qq = dividend / divisor; if (qq >> 64) __builtin_unreachable(); return qq; } ``` Could be optimized to using the 128/64->64 instruction since you say the upper bits are 0; otherwise it is undefined. Note a trap here could be how it is undefined.