Committed.
On Fri, Mar 28, 2025 at 10:25:26AM -0500, Nathan Bossart wrote:
> On Thu, Mar 27, 2025 at 03:31:27PM +0700, John Naylor wrote:
>> On Thu, Mar 27, 2025 at 10:38 AM Nathan Bossart
>> wrote:
>>> I also noticed a silly mistake in 0003 that would cause us to potentially
>>> skip part of the
On Thu, Mar 27, 2025 at 03:31:27PM +0700, John Naylor wrote:
> On Thu, Mar 27, 2025 at 10:38 AM Nathan Bossart
> wrote:
>> I also noticed a silly mistake in 0003 that would cause us to potentially
>> skip part of the tail. That should be fixed now.
>
> I'm not sure whether that meant it could re
On Thu, Mar 27, 2025 at 10:38 AM Nathan Bossart
wrote:
> I also noticed a silly mistake in 0003 that would cause us to potentially
> skip part of the tail. That should be fixed now.
I'm not sure whether that meant it could return the wrong answer, or
just make more work for paths further down.
I
On Wed, Mar 26, 2025 at 04:44:24PM -0500, Nathan Bossart wrote:
> IMHO these are acceptable results, at least for the use-cases I see in the
> tree. We might be able to minimize the difference between the Neon and SVE
> implementations on the low end with some additional code, but I'm really
> not
I've attached a new set of patches in which I've tried to address John's
feedback. I ran some new benchmarks with these patches. "M3" is an Apple
M3 (my laptop), "G3" is an r7g.4xlarge, and "G4" is an r8g.4xlarge. "no
SVE" means the patches are applied but the function pointer points to the
Neon
On Mon, Mar 24, 2025 at 06:34:45PM +0700, John Naylor wrote:
> On Sat, Mar 22, 2025 at 10:42 AM Nathan Bossart
> wrote:
>> * 0002 introduces the Neon implementation, which conveniently doesn't need
>> configure-time checks or function pointers. I noticed that some
>> compilers (e.g., Apple cl
On Sat, Mar 22, 2025 at 10:42 AM Nathan Bossart
wrote:
> * 0002 introduces the Neon implementation, which conveniently doesn't need
> configure-time checks or function pointers. I noticed that some
> compilers (e.g., Apple clang 16) compile in Neon instructions already,
> but our hand-roll
Looks good, the code is more readable now.
> For both Neon and SVE, I do see improvements with looping over 4
> registers at a time, so IMHO it's worth doing so even if it performs the
> same as 2-register blocks on some hardware.
There was no regression on Graviton 3 when using the 4-register
I've been preparing these for commit, and I've attached what I have so far.
A few notes:
* 0001 just renames the TRY_POPCNT_FAST macro to indicate that it's
x86_64-specific. IMO this is worth doing indpendent of this patch set,
but it's more important with the patch set since we need somethin
On Wed, Mar 13, 2025 at 12:02:07AM +, nathandboss...@gmail.com wrote:
> Those are nice results. I'm a little worried about the Neon implementation
> for smaller inputs since it uses a per-byte loop for the remaining bytes,
> though. If we can ensure there's no regression there, I think this p
On Wed, Mar 12, 2025 at 10:34:46AM +, chiranmoy.bhattacha...@fujitsu.com
wrote:
> On Wed, Mar 12, 2025 at 02:41:18AM +, nathandboss...@gmail.com wrote:
>
>> v5-no-sve is the result of using a function pointer, but pointing to the
>> "slow" versions instead of the SVE version. v5-sve is t
On Wed, Mar 12, 2025 at 02:41:18AM +, nathandboss...@gmail.com wrote:
> v5-no-sve is the result of using a function pointer, but pointing to the
> "slow" versions instead of the SVE version. v5-sve is the result of the
> latest patch in this thread on a machine with SVE support, and v5-4reg i
On Fri, Mar 07, 2025 at 03:20:07AM +, chiranmoy.bhattacha...@fujitsu.com
wrote:
> Sounds good. Let us know your findings.
Alright, here's what I saw on an R8g for drive_popcount(100, N):
8-byte words master v5-no-svev5-sve v5-4reg
1 2.540 ms 2.170
> Interesting. I do see different assembly with the 2 and 4 register
> versions, but I didn't get to testing it on a machine with SVE support
> today.
> Besides some additional benchmarking, I might make some small adjustments
> to the patch. But overall, it seems to be in decent shape.
Sounds
On Wed, Feb 19, 2025 at 09:31:50AM +, chiranmoy.bhattacha...@fujitsu.com
wrote:
>> Hm. Any idea why that is? I wonder if the compiler isn't using as many
>> SVE registers as it could for this.
>
> Not sure, we tried forcing loop unrolling using the below line in the MakeFile
> but the resul
> Hm. Any idea why that is? I wonder if the compiler isn't using as many
> SVE registers as it could for this.
Not sure, we tried forcing loop unrolling using the below line in the MakeFile
but the results are the same.
pg_popcount_sve.o: CFLAGS += ${CFLAGS_UNROLL_LOOPS} -march=native
> I've
On Thu, Feb 06, 2025 at 10:33:35AM -0600, Nathan Bossart wrote:
> On Thu, Feb 06, 2025 at 08:44:35AM +, chiranmoy.bhattacha...@fujitsu.com
> wrote:
>>> Does this hand-rolled loop unrolling offer any particular advantage? What
>>> do the numbers look like if we don't do this or if we process,
On Thu, Feb 06, 2025 at 08:44:35AM +, chiranmoy.bhattacha...@fujitsu.com
wrote:
>> Does this hand-rolled loop unrolling offer any particular advantage? What
>> do the numbers look like if we don't do this or if we process, say, 4
>> vectors at a time?
>
> The unrolled version performs better
> Hm. These results are so similar that I'm tempted to suggest we just
> remove the section of code dedicated to alignment. Is there any reason not
> to do that?
It seems that the double load overhead from unaligned memory access isn’t
too taxing, even on larger inputs. We can remove it to simpl
On Tue, Feb 04, 2025 at 09:01:33AM +, chiranmoy.bhattacha...@fujitsu.com
wrote:
>> +/*
>> + * For smaller inputs, aligning the buffer degrades the performance.
>> + * Therefore, the buffers only when the input size is sufficiently
>> large.
>> + */
>
>> Is the inverse true, i
> The meson configure check seems to fail on my machine
> This test looks quite different than the autoconf one. Why is that? I
would expect them to be the same. And I think ideally the test would check
that all the intrinsics functions we need are available.
Fixed, both meson and autoconf have
The meson configure check seems to fail on my machine:
error: too many arguments to function call, expected 0, have 1
10 | svuint64_t popcnt = svcntb(val);
| ~~ ^~~
error: returning '__SVInt64_t' from a function with incompa
On Wed, Jan 22, 2025 at 11:04:22AM +, chiranmoy.bhattacha...@fujitsu.com
wrote:
> If there is no further feedback from the community, may we submit the
> patch for the next commit fest?
I would encourage you to create a commitfest entry so that it is picked up
by our automated patch testing t
> This looks good. Thanks Chiranmoy and team. Can you address any other
> feedback from Nathan or others here? Then we can pursue further reviews and
> merging of the patch.
Thank you for the review.
If there is no further feedback from the community, may we submit the patch for
the next commit
Here is the updated patch using
pg_attribute_target("arch=armv8-a+sve") to compile the arch-specific
function instead of using compiler flags.
---
This looks good. Thanks Chiranmoy and team. Can you address any other
feedback from Nathan or others here? Then we can pursue further reviews
an
Hi all,
Here is the updated patch using pg_attribute_target("arch=armv8-a+sve") to
compile the arch-specific function instead of using compiler flags.
---
Chiranmoy
v3-0001-SVE-support-for-popcount-and-popcount-masked.patch
Description: v3-0001-SVE-support-for-popcount-and-popcount-masked.p
dboss...@gmail.com>
Sent: Wednesday, December 4, 2024 21:37
To: Malladi, Rama <mailto:ramamall...@hotmail.com>
Cc: Kirill Reshke <mailto:reshkekir...@gmail.com>;
pgsql-hackers
<mailto:pgsql-hack...@postgresql.org>
Subject: Re: [PATCH] SVE popcount support
On Wed, Dec 04, 2024 at
Malladi, Rama
*Cc:* Kirill Reshke ; pgsql-hackers
*Subject:* Re: [PATCH] SVE popcount support
On Wed, Dec 04, 2024 at 08:51:39AM -0600, Malladi, Rama wrote:
> Thank you, Kirill, for the review and the feedback. Please find
inline my
> reply and an updated patch.
Thanks for the updated
On Wed, Dec 04, 2024 at 08:51:39AM -0600, Malladi, Rama wrote:
> Thank you, Kirill, for the review and the feedback. Please find inline my
> reply and an updated patch.
Thanks for the updated patch. I have a couple of high-level comments.
Would you mind adding this to the commitfest system
(https
On Wed, Nov 27, 2024 at 03:43:27PM +, Malladi, Rama wrote:
> • Attachments protected by Amazon:
> • 0001-SVE-popcount-support.patch |
> • SVE-popcount-support-PostgreSQL.png |
>
> Amazon has replaced the attachments in this email with download links.
> Downloads will be available until D
On Thu, 28 Nov 2024 at 20:22, Malladi, Rama wrote:
>
> Attachments protected by Amazon: 0001-SVE-popcount-support.patch |
SVE-popcount-support-PostgreSQL.png |
> Amazon has replaced the attachments in this email with download links.
Downloads will be available until December 27, 2024, 15:43 (UTC+0
On Thu, 28 Nov 2024 at 20:22, Malladi, Rama wrote:
>
> Attachments protected by Amazon: 0001-SVE-popcount-support.patch |
> SVE-popcount-support-PostgreSQL.png |
> Amazon has replaced the attachments in this email with download links.
> Downloads will be available until December 27, 2024, 15:43
32 matches
Mail list logo