Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands.

Joel Jacobson Sat, 06 Jul 2024 04:17:52 -0700

On Sat, Jul 6, 2024, at 11:34, Dean Rasheed wrote:
> On Fri, 5 Jul 2024 at 18:37, Joel Jacobson <j...@compiler.org> wrote:
>>
>> On Fri, Jul 5, 2024, at 18:42, Joel Jacobson wrote:
>> > Very nice, v7-optimize-numeric-mul_var-small-var1-arbitrary-var2.patch
>> > is now the winner on all my CPUs:
>>
>> I thought it would be interesting to also measure the isolated effect
>> on just numeric_mul() without the query overhead.
>>
>> Impressive speed-up, between 25% - 81%.
>>
>
> Cool. I think we should go with the mul_var_small() patch then, since
> it's more generally applicable.


I agree.

> I also did some testing with much larger var2 values, and saw similar
> speed-ups. One high-level function that benefits from that is
> factorial(), which accepts inputs up to 32177, and so uses both the
> 1-digit and 2-digit code with very large var2 values. I doubt anyone
> actually uses it with such large inputs, but it's interesting
> nonetheless:
>
> SELECT factorial(32177);
> Time: 923.117 ms  -- HEAD
> Time: 534.375 ms  -- mul_var_small() patch

Nice!

> I did one more round of (mostly cosmetic) copy-editing. Aside from
> improving some of the comments, it occurred to me that there's no need
> to pass rscale to mul_var_small(), or for it to call round_var(),
> since it's always computing the exact result. That shaves off a few
> more cycles.

Nice, also cleaner.

> Additionally, I didn't like how res_weight and res_ndigits were being
> set 1 higher than they needed to be. That makes sense in mul_var()
> because it may round the result, causing a non-zero carry to propagate
> into the next digit up, but it's just confusing in mul_var_small(). So
> I've reduced those by 1, which makes the look much more logical. To be
> clear, this doesn't change how many digits we're calculating. But now
> res_ndigits is actually the number of digits being calculated, whereas
> before, res_ndigits was 1 larger and we were calculating res_ndigits -
> 1 digits, which was confusing.

Nice, much cleaner.

> I think this is good to go, so unless there are any further comments,
> I plan to commit it soon.

LGTM.

Benchmark, only on Apple M3 Max:

SELECT SUM(var1*var2) FROM bench_mul_var_var1ndigits_1; -- HEAD
Time: 3042.157 ms (00:03.042)
Time: 3027.711 ms (00:03.028)
Time: 3078.215 ms (00:03.078)

SELECT SUM(var1*var2) FROM bench_mul_var_var1ndigits_1; -- v8
Time: 2700.676 ms (00:02.701)
Time: 2713.594 ms (00:02.714)
Time: 2704.139 ms (00:02.704)

SELECT SUM(var1*var2) FROM bench_mul_var_var1ndigits_2; -- HEAD
Time: 4506.064 ms (00:04.506)
Time: 3316.204 ms (00:03.316)
Time: 3321.086 ms (00:03.321)

SELECT SUM(var1*var2) FROM bench_mul_var_var1ndigits_2; -- v8
Time: 2904.786 ms (00:02.905)
Time: 2921.996 ms (00:02.922)
Time: 2919.269 ms (00:02.919)

SELECT SUM(var1*var2) FROM bench_mul_var_var1ndigits_3; -- HEAD
Time: 4636.051 ms (00:04.636)
Time: 3439.951 ms (00:03.440)
Time: 3471.245 ms (00:03.471)

SELECT SUM(var1*var2) FROM bench_mul_var_var1ndigits_3; -- v8
Time: 3034.364 ms (00:03.034)
Time: 3025.351 ms (00:03.025)
Time: 3075.024 ms (00:03.075)

SELECT SUM(var1*var2) FROM bench_mul_var_var1ndigits_4; -- HEAD
Time: 4978.086 ms (00:04.978)
Time: 3580.283 ms (00:03.580)
Time: 3582.719 ms (00:03.583)

SELECT SUM(var1*var2) FROM bench_mul_var_var1ndigits_4; -- v8
Time: 3147.352 ms (00:03.147)
Time: 3135.903 ms (00:03.136)
Time: 3172.491 ms (00:03.172)

Regards,
Joel

Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands.

Reply via email to