https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117266
Bug ID: 117266 Summary: RFE: builtins for N*N -> 2N multiplication and 2N/N -> N div/mod Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: hpa at zytor dot com Target Milestone: --- As has pointed out in e.g. bug 82677, these operations are fairly commonly used in multi-precision arithmetic and cryptography. These are currently often implemented in inline asm because of lack of compiler support. This is not only inherently unportable across either architectures(*) or compilers and makes it impossible for the compiler to optimize, but leads to other problems, including the coding problems described in bug 82677, and the following. The multiplication can be fudged by casting to a wider data type before the multiply, but: (a) requires knowing the width of the type, which may be a typedef or macro argument; (b) is guaranteed to be unavailable for the largest integer type supported, which is machine- and compiler-dependent. I would like to suggest the following type-independent builtin: TYPE3 __builtin_mulh(TYPE1 a, TYPE2 b) ... to return the upper half of the multiplication. TYPE3 is signed if either TYPE1 or TYPE2 is signed. (I assume the compiler is smart enough to be able to apply common subexpression elimination with the low-half multiply.) The division/remainder case is even uglier. Here the widening trick doesn't work, because it changes the trapping behavior. As a result, gcc often ends up calling a libgcc routine, which is almost certainly *not* what the user wanted in their carefully tuned code. Here I suggest: TYPE __builtin_div2(TYPE hi, TYPE lo, TYPE divisor) TYPE __builtin_rem2(TYPE hi, TYPE lo, TYPE divisor) Here is may be necessary to coerce all the types to the same type, as in normal two-operand integer arithmetic, except that the signedness of "lo" arguably doesn't matter. Note that I'm NOT including extended-precision addition and subtraction support here. The reason is that the carry-out from addition or subtraction is easily generated using standard idioms, and I believe gcc already recognize them: unsigned TYPE x, y; /* signed types are a little trickier, but doable */ unsigned TYPE a = x + y bool carry = a < x; /* or a < y */ unsigned TYPE b = x - y; bool borrow = x < y; /* or b > x */ Ideally this should eventually make it into the C standard, of course, but they generally want an implementation first to reference. (*) The implementation strategies vary among architectures, which is another reason the inline assembly solution is a real headache. For example: - x86: special multiply instructions that output both high and low value in different registers; - SPARC: a special "y" register that receives the high half of a multiply; - MIPS/RISC-V: separate instructions for high and low multiply, *BUT* the mulh* instructions may forward the low half result to a mul instruction that uses the same register operands.