On Tue, Jul 9, 2024, at 14:01, Dean Rasheed wrote:
> One thing I noticed while testing the earlier patches on this thread
> was that they were significantly faster if they used unsigned integers
> rather than signed integers. I think the reason is that operations
> like "x / 1" and "x % 1"
On Tue, Jul 9, 2024, at 16:11, Joel Jacobson wrote:
> I added some more ndigits test cases:
Ops, please ignore previous benchmark;
I had forgot to commit in between the measurements,
so they all ran in the same db txn,
which caused a lot of noise on few ndigits.
New benchmark:
> /*
> * Intel Co
On Tue, Jul 9, 2024, at 14:01, Dean Rasheed wrote:
> Before considering the other patches to optimise for larger inputs, I
> think it's worth optimising the existing mul_var() code as much as
> possible.
>
> One thing I noticed while testing the earlier patches on this thread
> was that they were s
On Tue, 9 Jul 2024 at 10:11, Dean Rasheed wrote:
>
> OK, I have committed this.
>
Before considering the other patches to optimise for larger inputs, I
think it's worth optimising the existing mul_var() code as much as
possible.
One thing I noticed while testing the earlier patches on this threa
On Sat, 6 Jul 2024 at 12:17, Joel Jacobson wrote:
>
> > I think this is good to go, so unless there are any further comments,
> > I plan to commit it soon.
>
> LGTM.
>
OK, I have committed this.
At the last minute, I changed the name of the new function to
mul_var_short() because "short" is prob
On Sat, Jul 6, 2024, at 11:34, Dean Rasheed wrote:
> On Fri, 5 Jul 2024 at 18:37, Joel Jacobson wrote:
>>
>> On Fri, Jul 5, 2024, at 18:42, Joel Jacobson wrote:
>> > Very nice, v7-optimize-numeric-mul_var-small-var1-arbitrary-var2.patch
>> > is now the winner on all my CPUs:
>>
>> I thought it wou
On Fri, 5 Jul 2024 at 18:37, Joel Jacobson wrote:
>
> On Fri, Jul 5, 2024, at 18:42, Joel Jacobson wrote:
> > Very nice, v7-optimize-numeric-mul_var-small-var1-arbitrary-var2.patch
> > is now the winner on all my CPUs:
>
> I thought it would be interesting to also measure the isolated effect
> on
On Fri, Jul 5, 2024, at 18:42, Joel Jacobson wrote:
> Very nice, v7-optimize-numeric-mul_var-small-var1-arbitrary-var2.patch
> is now the winner on all my CPUs:
I thought it would be interesting to also measure the isolated effect
on just numeric_mul() without the query overhead.
Included var1ndi
On Fri, Jul 5, 2024, at 17:41, Dean Rasheed wrote:
> On Fri, 5 Jul 2024 at 12:56, Joel Jacobson wrote:
>>
>> Interesting you got so bad bench results for v6-mul_var_int64.patch
>> for var1ndigits=4, that patch is actually the winner on AMD Ryzen 9 7950X3D.
>
> Interesting.
I remeasured just to be
On Fri, 5 Jul 2024 at 12:56, Joel Jacobson wrote:
>
> Interesting you got so bad bench results for v6-mul_var_int64.patch
> for var1ndigits=4, that patch is actually the winner on AMD Ryzen 9 7950X3D.
Interesting.
> On Intel Core i9-14900K the winner is
> v6-optimize-numeric-mul_var-small-var1-
On Thu, Jul 4, 2024, at 20:43, Dean Rasheed wrote:
> On Wed, 3 Jul 2024 at 21:45, Joel Jacobson wrote:
>>
>> > On Wed, Jul 3, 2024, at 20:57, Dean Rasheed wrote:
>> >> I wouldn't expect it to ever be off by more than 1
>> >
>> > OK, so then the cases I found where it was off by 2 for the mul_var_i
On Wed, 3 Jul 2024 at 21:45, Joel Jacobson wrote:
>
> > On Wed, Jul 3, 2024, at 20:57, Dean Rasheed wrote:
> >> I wouldn't expect it to ever be off by more than 1
> >
> > OK, so then the cases I found where it was off by 2 for the mul_var_int()
> > patch
> > are unexpected?
>
> Sorry, I meant off
On Thu, Jul 4, 2024, at 09:38, Joel Jacobson wrote:
> Summary of benchmark results:
>
> cpu | var1ndigits | winner
> --+-+-
..
> v5-optimize-numeric-mul_var-small-
On Wed, Jul 3, 2024, at 13:17, Dean Rasheed wrote:
> Anyway, here are both patches for comparison. I'll stop hacking for a
> while and let you see what you make of these.
>
> Regards,
> Dean
>
> Attachments:
> * v5-optimize-numeric-mul_var-small-var1-arbitrary-var2.patch
> * v5-add-mul_var_int.patc
On Wed, Jul 3, 2024, at 22:27, Joel Jacobson wrote:
> On Wed, Jul 3, 2024, at 20:57, Dean Rasheed wrote:
>> I wouldn't expect it to ever be off by more than 1, given that
>> MUL_GUARD_DIGITS = 2, which corresponds to 8 decimal digits, and the
>> number of digits in the smaller input (and hence the
On Wed, Jul 3, 2024, at 20:57, Dean Rasheed wrote:
> Ah yes, I think I was looking at a newer version of the code where I'd
> already fixed that bug. Unless you think there are still bugs in any
> of the boundary checks, which is entirely possible.
Ah, that explains it.
And no, I can't find any ot
On Wed, Jul 3, 2024, at 15:48, Joel Jacobson wrote:
> On Wed, Jul 3, 2024, at 13:17, Dean Rasheed wrote:
>> On Tue, 2 Jul 2024 at 21:10, Joel Jacobson wrote:
>>>
>>> I found the bug in the case 3 code,
>>> and it turns out the same type of bug also exists in the case 2 code:
>>>
>>>
On Wed, 3 Jul 2024 at 14:49, Joel Jacobson wrote:
>
> Hmm, I don't see how the case 2 code can be correct?
> If, like you say, res_ndigits can't be less than 3, that means it can be 3,
> right?
> And if res_ndigits=3 then `var2digits[res_ndigits - 4]` would try to access
> `var2digits[-1]`.
>
A
On Wed, Jul 3, 2024, at 13:17, Dean Rasheed wrote:
> On Tue, 2 Jul 2024 at 21:10, Joel Jacobson wrote:
>>
>> I found the bug in the case 3 code,
>> and it turns out the same type of bug also exists in the case 2 code:
>>
>> case 2:
>> newdig
On Mon, Jul 1, 2024 at 6:19 PM Dean Rasheed wrote:
> Repeating your benchmark where both numbers have up to 2 NBASE-digits,
> this new approach was slightly faster:
>
> SELECT SUM(num1*num2) FROM bench_mul_var; -- HEAD
> Time: 4762.990 ms (00:04.763)
> Time: 4332.166 ms (00:04.332)
> Time: 4276.21
Em qua., 3 de jul. de 2024 às 08:18, Dean Rasheed
escreveu:
> On Tue, 2 Jul 2024 at 21:10, Joel Jacobson wrote:
> >
> > I found the bug in the case 3 code,
> > and it turns out the same type of bug also exists in the case 2 code:
> >
> > case 2:
> >
On Tue, 2 Jul 2024 at 21:10, Joel Jacobson wrote:
>
> I found the bug in the case 3 code,
> and it turns out the same type of bug also exists in the case 2 code:
>
> case 2:
> newdig = (int) var1digits[1] *
> var2digits[res_ndigits - 4];
>
>
On Tue, 2 Jul 2024 at 20:55, Joel Jacobson wrote:
>
> Interesting, I actually think there is a bug in the normal mul_var() code.
> Found a case that rounds down, when it should round up:
>
> Calling mul_var() with:
> var1=51.2945442386599
> var2=0.828548712212
> rscale=0
>
> returns 42, but I thin
On Tue, Jul 2, 2024, at 22:10, Joel Jacobson wrote:
> * v4-optimize-numeric-mul_var-small-var1-arbitrary-var2.patch
Instead of these boundary checks, maybe it would be cleaner to just
skip this optimization if there are too few res_ndigits?
/Joel
On Tue, Jul 2, 2024, at 21:55, Joel Jacobson wrote:
> On Tue, Jul 2, 2024, at 20:53, Joel Jacobson wrote:
>> Trying to wrap my head around what could cause this.
I found the bug in the case 3 code,
and it turns out the same type of bug also exists in the case 2 code:
case
On Tue, Jul 2, 2024, at 20:53, Joel Jacobson wrote:
> Trying to wrap my head around what could cause this.
>
> It's rounding down instead of up, and these cases all end with decimal
> .500.
Interesting, I actually think there is a bug in the normal mul_var() code.
Found a case that rounds dow
On Tue, Jul 2, 2024, at 18:20, Joel Jacobson wrote:
> * v3-optimize-numeric-mul_var-small-var1-arbitrary-var2.patch
Hmm, v3 contains a bug which I haven't been able to solve yet.
Reporting now to avoid time waste reviewing it since it's buggy.
The attached patch is how I tested and found the bug.
On Tue, Jul 2, 2024, at 13:44, Dean Rasheed wrote:
>> Can you think of an example that should trigger the bug?
>
> .0001 * 5000._ with rscale = 0 triggers it (returned
> 50004999 instead of 50005000).
Thanks, helpful.
Attached patch adds the var1ndigits=3 case.
Benchmark:
/*
* Appl
On Tue, 2 Jul 2024 at 11:23, Joel Jacobson wrote:
>
> Just to be able to verify mul_var() is working as expected when
> rscale is less than the full result, I've added a numeric_mul_patched()
> function that takes a third rscale_adjustment parameter:
Yeah, we could expose such a function, and may
On Tue, Jul 2, 2024, at 11:05, Joel Jacobson wrote:
> On Tue, Jul 2, 2024, at 10:22, Dean Rasheed wrote:
>> Shortly after posting that, I realised that there was a small bug. This bit:
>>
>> case 2:
>> newdig = (int) var1digits[1] * var2digits[res_ndigits - 4];
>>
>> isn
On Tue, Jul 2, 2024, at 10:22, Dean Rasheed wrote:
> Shortly after posting that, I realised that there was a small bug. This bit:
>
> case 2:
> newdig = (int) var1digits[1] * var2digits[res_ndigits - 4];
>
> isn't quite right in the case where rscale is less than the ful
On Tue, 2 Jul 2024 at 08:50, Joel Jacobson wrote:
>
> On Tue, Jul 2, 2024, at 00:19, Dean Rasheed wrote:
>
> > Attachments:
> > * optimize-numeric-mul_var-small-var1-arbitrary-var2.patch.txt
>
Shortly after posting that, I realised that there was a small bug. This bit:
case 2:
On Tue, Jul 2, 2024, at 00:19, Dean Rasheed wrote:
> I had a play with this, and came up with a slightly different way of
> doing it that works for var2 of any size, as long as var1 is just 1 or
> 2 digits.
>
> Repeating your benchmark where both numbers have up to 2 NBASE-digits,
> this new approa
On Mon, 1 Jul 2024 at 20:56, Joel Jacobson wrote:
>
> Below is a more realistic benchmark
>
> CREATE TABLE bench_mul_var (num1 numeric, num2 numeric);
>
> INSERT INTO bench_mul_var (num1, num2)
> SELECT random(0::numeric,1e8::numeric), random(0::numeric,1e8::numeric) FROM
> generate_series(1,1e8)
On Mon, Jul 1, 2024, at 15:14, Joel Jacobson wrote:
> * 0001-Optimize-mul_var-for-var2ndigits-4.patch
Found a typo, fixed in new version.
The int128 version is still slower though,
I wonder if there is something that can be done to speed it up further.
Below is a more realistic benchmark than ju
On Mon, Jul 1, 2024, at 15:11, Joel Jacobson wrote:
> Not really sure why. Maybe the code I tried can be optimized further:
If anyone want experiment with the int128 version,
here is a patch that adds a separate numeric_mul_patched() function,
so it's easier to benchmark against the unmodified num
On Mon, Jul 1, 2024, at 14:25, Dagfinn Ilmari Mannsåker wrote:
> div_var() also has an optimisation for 3- and 4-digit operands under
> HAVE_INT128 (added in commit 0aa38db56bf), would that make sense in
> mul_var() too?
I considered it, but it only gives a marginal speed-up on Intel Core i9-14900
"Joel Jacobson" writes:
> Hello hackers,
>
> Attached patch introduces an optimization of mul_var() in numeric.c,
> targeting cases where the multiplicands consist of only one or two
> base-NBASE digits. Such small multiplicands can fit into an int64 and
> thus be computed directly, resulting in
On Mon, Jul 1, 2024, at 08:04, Joel Jacobson wrote:
> * 0001-optimize-numeric-mul_var-small-factors.patch
New version to silence maybe-uninitialized error reported by cfbot.
/Joel
v2-0001-optimize-numeric-mul_var-small-factors.patch
Description: Binary data
Hello hackers,
Attached patch introduces an optimization of mul_var() in numeric.c, targeting
cases where the multiplicands consist of only one or two base-NBASE digits.
Such small multiplicands can fit into an int64 and thus be computed directly,
resulting in a significant performance improvem
40 matches
Mail list logo