Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands.

Dean Rasheed Mon, 01 Jul 2024 15:20:01 -0700

On Mon, 1 Jul 2024 at 20:56, Joel Jacobson <j...@compiler.org> wrote:
>
> Below is a more realistic benchmark
>
> CREATE TABLE bench_mul_var (num1 numeric, num2 numeric);
>
> INSERT INTO bench_mul_var (num1, num2)
> SELECT random(0::numeric,1e8::numeric), random(0::numeric,1e8::numeric) FROM 
> generate_series(1,1e8);
>
> SELECT SUM(num1*num2) FROM bench_mul_var;


I had a play with this, and came up with a slightly different way of
doing it that works for var2 of any size, as long as var1 is just 1 or
2 digits.

Repeating your benchmark where both numbers have up to 2 NBASE-digits,
this new approach was slightly faster:

SELECT SUM(num1*num2) FROM bench_mul_var; -- HEAD
Time: 4762.990 ms (00:04.763)
Time: 4332.166 ms (00:04.332)
Time: 4276.211 ms (00:04.276)
Time: 4247.321 ms (00:04.247)
Time: 4166.738 ms (00:04.167)

SELECT SUM(num1*num2) FROM bench_mul_var; -- v2 patch
Time: 4398.812 ms (00:04.399)
Time: 3672.668 ms (00:03.673)
Time: 3650.227 ms (00:03.650)
Time: 3611.420 ms (00:03.611)
Time: 3534.218 ms (00:03.534)

SELECT SUM(num1*num2) FROM bench_mul_var; -- this patch
Time: 3350.596 ms (00:03.351)
Time: 3336.224 ms (00:03.336)
Time: 3335.599 ms (00:03.336)
Time: 3336.990 ms (00:03.337)
Time: 3351.453 ms (00:03.351)

(This was on an older Intel Core i9-9900K, so I'm not sure why all the
timings are faster. What compiler settings are you using?)

The approach taken in this patch only uses 32-bit integers, so in
theory it could be extended to work for var1ndigits = 3, 4, or even
more, but the code would get increasingly complex, and I ran out of
steam at 2 digits. It might be worth trying though.

Regards,
Dean

diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
new file mode 100644
index 5510a20..adbfd5c
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -8748,6 +8748,74 @@ mul_var(const NumericVar *var1, const Nu
        }
 
        /*
+        * Simplified fast-path computation, if var1 has just one or two digits.
+        * This is significantly faster, since it avoids allocating a separate
+        * digit array, making multiple passes over var2, and having separate
+        * carry-propagation passes.
+        */
+       if (var1ndigits <= 2)
+       {
+               NumericDigit *res_buf;
+
+               /* Allocate result digit array */
+               res_buf = digitbuf_alloc(res_ndigits);
+               res_buf[0] = 0;                         /* spare digit for 
later rounding */
+               res_digits = res_buf + 1;
+
+               /*
+                * Compute the result digits directly, in one pass, propagating 
the
+                * carry up as we go.
+                */
+               switch (var1ndigits)
+               {
+                       case 1:
+                               carry = 0;
+                               for (i = res_ndigits - 3; i >= 0; i--)
+                               {
+                                       newdig = (int) var1digits[0] * 
var2digits[i] + carry;
+                                       res_digits[i + 1] = (NumericDigit) 
(newdig % NBASE);
+                                       carry = newdig / NBASE;
+                               }
+                               res_digits[0] = (NumericDigit) carry;
+                               break;
+
+                       case 2:
+                               newdig = (int) var1digits[1] * 
var2digits[res_ndigits - 4];
+                               res_digits[res_ndigits - 2] = (NumericDigit) 
(newdig % NBASE);
+                               carry = newdig / NBASE;
+
+                               for (i = res_ndigits - 4; i > 0; i--)
+                               {
+                                       newdig = (int) var1digits[0] * 
var2digits[i] +
+                                               (int) var1digits[1] * 
var2digits[i - 1] + carry;
+                                       res_digits[i + 1] = (NumericDigit) 
(newdig % NBASE);
+                                       carry = newdig / NBASE;
+                               }
+
+                               newdig = (int) var1digits[0] * var2digits[0] + 
carry;
+                               res_digits[1] = (NumericDigit) (newdig % NBASE);
+                               res_digits[0] = (NumericDigit) (newdig / NBASE);
+                               break;
+               }
+
+               /* Store the product in result (minus extra rounding digit) */
+               digitbuf_free(result->buf);
+               result->ndigits = res_ndigits - 1;
+               result->buf = res_buf;
+               result->digits = res_digits;
+               result->weight = res_weight - 1;
+               result->sign = res_sign;
+
+               /* Round to target rscale (and set result->dscale) */
+               round_var(result, rscale);
+
+               /* Strip leading and trailing zeroes */
+               strip_var(result);
+
+               return;
+       }
+
+       /*
         * We do the arithmetic in an array "dig[]" of signed int's.  Since
         * INT_MAX is noticeably larger than NBASE*NBASE, this gives us headroom
         * to avoid normalizing carries immediately.

Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands.

Reply via email to