https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117266

            Bug ID: 117266
           Summary: RFE: builtins for N*N -> 2N multiplication and 2N/N ->
                    N div/mod
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: hpa at zytor dot com
  Target Milestone: ---

As has pointed out in e.g. bug 82677, these operations are fairly commonly used
in multi-precision arithmetic and cryptography.

These are currently often implemented in inline asm because of lack of compiler
support. This is not only inherently unportable across either architectures(*)
or compilers and makes it impossible for the compiler to optimize, but leads to
other problems, including the coding problems described in bug 82677, and the
following.

The multiplication can be fudged by casting to a wider data type before the
multiply, but:

(a) requires knowing the width of the type, which may be a typedef or macro
argument;

(b) is guaranteed to be unavailable for the largest integer type supported,
which is machine- and compiler-dependent.

I would like to suggest the following type-independent builtin:

   TYPE3 __builtin_mulh(TYPE1 a, TYPE2 b)

... to return the upper half of the multiplication. TYPE3 is signed if either
TYPE1 or TYPE2 is signed.

(I assume the compiler is smart enough to be able to apply common subexpression
elimination with the low-half multiply.)

The division/remainder case is even uglier. Here the widening trick doesn't
work, because it changes the trapping behavior. As a result, gcc often ends up
calling a libgcc routine, which is almost certainly *not* what the user wanted
in their carefully tuned code.

Here I suggest:

    TYPE __builtin_div2(TYPE hi, TYPE lo, TYPE divisor)
    TYPE __builtin_rem2(TYPE hi, TYPE lo, TYPE divisor)

Here is may be necessary to coerce all the types to the same type, as in normal
two-operand integer arithmetic, except that the signedness of "lo" arguably
doesn't matter.

Note that I'm NOT including extended-precision addition and subtraction support
here. The reason is that the carry-out from addition or subtraction is easily
generated using standard idioms, and I believe gcc already recognize them:

    unsigned TYPE x, y; /* signed types are a little trickier, but doable */

    unsigned TYPE a = x + y
    bool carry = a < x;  /* or a < y */

    unsigned TYPE b = x - y;
    bool borrow = x < y; /* or b > x */

Ideally this should eventually make it into the C standard, of course, but they
generally want an implementation first to reference.



(*) The implementation strategies vary among architectures, which is another
reason the inline assembly solution is a real headache. For example:

- x86: special multiply instructions that output both high and low value in
different registers;

- SPARC: a special "y" register that receives the high half of a multiply;

- MIPS/RISC-V: separate instructions for high and low multiply, *BUT* the mulh*
instructions may forward the low half result to a mul instruction that uses the
same register operands.

Reply via email to