Re: [PATCH] SVE popcount support

2025-03-28 Thread Nathan Bossart
Committed. On Fri, Mar 28, 2025 at 10:25:26AM -0500, Nathan Bossart wrote: > On Thu, Mar 27, 2025 at 03:31:27PM +0700, John Naylor wrote: >> On Thu, Mar 27, 2025 at 10:38 AM Nathan Bossart >> wrote: >>> I also noticed a silly mistake in 0003 that would cause us to potentially >>> skip part of the

Re: [PATCH] SVE popcount support

2025-03-28 Thread Nathan Bossart
On Thu, Mar 27, 2025 at 03:31:27PM +0700, John Naylor wrote: > On Thu, Mar 27, 2025 at 10:38 AM Nathan Bossart > wrote: >> I also noticed a silly mistake in 0003 that would cause us to potentially >> skip part of the tail. That should be fixed now. > > I'm not sure whether that meant it could re

Re: [PATCH] SVE popcount support

2025-03-27 Thread John Naylor
On Thu, Mar 27, 2025 at 10:38 AM Nathan Bossart wrote: > I also noticed a silly mistake in 0003 that would cause us to potentially > skip part of the tail. That should be fixed now. I'm not sure whether that meant it could return the wrong answer, or just make more work for paths further down. I

Re: [PATCH] SVE popcount support

2025-03-26 Thread Nathan Bossart
On Wed, Mar 26, 2025 at 04:44:24PM -0500, Nathan Bossart wrote: > IMHO these are acceptable results, at least for the use-cases I see in the > tree. We might be able to minimize the difference between the Neon and SVE > implementations on the low end with some additional code, but I'm really > not

Re: [PATCH] SVE popcount support

2025-03-26 Thread Nathan Bossart
I've attached a new set of patches in which I've tried to address John's feedback. I ran some new benchmarks with these patches. "M3" is an Apple M3 (my laptop), "G3" is an r7g.4xlarge, and "G4" is an r8g.4xlarge. "no SVE" means the patches are applied but the function pointer points to the Neon

Re: [PATCH] SVE popcount support

2025-03-24 Thread Nathan Bossart
On Mon, Mar 24, 2025 at 06:34:45PM +0700, John Naylor wrote: > On Sat, Mar 22, 2025 at 10:42 AM Nathan Bossart > wrote: >> * 0002 introduces the Neon implementation, which conveniently doesn't need >> configure-time checks or function pointers. I noticed that some >> compilers (e.g., Apple cl

Re: [PATCH] SVE popcount support

2025-03-24 Thread John Naylor
On Sat, Mar 22, 2025 at 10:42 AM Nathan Bossart wrote: > * 0002 introduces the Neon implementation, which conveniently doesn't need > configure-time checks or function pointers. I noticed that some > compilers (e.g., Apple clang 16) compile in Neon instructions already, > but our hand-roll

Re: [PATCH] SVE popcount support

2025-03-23 Thread chiranmoy.bhattacha...@fujitsu.com
Looks good, the code is more readable now. > For both Neon and SVE, I do see improvements with looping over 4 > registers at a time, so IMHO it's worth doing so even if it performs the > same as 2-register blocks on some hardware. There was no regression on Graviton 3 when using the 4-register

Re: [PATCH] SVE popcount support

2025-03-21 Thread Nathan Bossart
I've been preparing these for commit, and I've attached what I have so far. A few notes: * 0001 just renames the TRY_POPCNT_FAST macro to indicate that it's x86_64-specific. IMO this is worth doing indpendent of this patch set, but it's more important with the patch set since we need somethin

Re: [PATCH] SVE popcount support

2025-03-19 Thread chiranmoy.bhattacha...@fujitsu.com
On Wed, Mar 13, 2025 at 12:02:07AM +, nathandboss...@gmail.com wrote: > Those are nice results. I'm a little worried about the Neon implementation > for smaller inputs since it uses a per-byte loop for the remaining bytes, > though. If we can ensure there's no regression there, I think this p

Re: [PATCH] SVE popcount support

2025-03-12 Thread Nathan Bossart
On Wed, Mar 12, 2025 at 10:34:46AM +, chiranmoy.bhattacha...@fujitsu.com wrote: > On Wed, Mar 12, 2025 at 02:41:18AM +, nathandboss...@gmail.com wrote: > >> v5-no-sve is the result of using a function pointer, but pointing to the >> "slow" versions instead of the SVE version. v5-sve is t

Re: [PATCH] SVE popcount support

2025-03-12 Thread chiranmoy.bhattacha...@fujitsu.com
On Wed, Mar 12, 2025 at 02:41:18AM +, nathandboss...@gmail.com wrote: > v5-no-sve is the result of using a function pointer, but pointing to the > "slow" versions instead of the SVE version. v5-sve is the result of the > latest patch in this thread on a machine with SVE support, and v5-4reg i

Re: [PATCH] SVE popcount support

2025-03-11 Thread Nathan Bossart
On Fri, Mar 07, 2025 at 03:20:07AM +, chiranmoy.bhattacha...@fujitsu.com wrote: > Sounds good. Let us know your findings. Alright, here's what I saw on an R8g for drive_popcount(100, N): 8-byte words master v5-no-svev5-sve v5-4reg 1 2.540 ms 2.170

Re: [PATCH] SVE popcount support

2025-03-06 Thread chiranmoy.bhattacha...@fujitsu.com
> Interesting. I do see different assembly with the 2 and 4 register > versions, but I didn't get to testing it on a machine with SVE support > today. > Besides some additional benchmarking, I might make some small adjustments > to the patch. But overall, it seems to be in decent shape. Sounds

Re: [PATCH] SVE popcount support

2025-03-03 Thread Nathan Bossart
On Wed, Feb 19, 2025 at 09:31:50AM +, chiranmoy.bhattacha...@fujitsu.com wrote: >> Hm. Any idea why that is? I wonder if the compiler isn't using as many >> SVE registers as it could for this. > > Not sure, we tried forcing loop unrolling using the below line in the MakeFile > but the resul

Re: [PATCH] SVE popcount support

2025-02-19 Thread chiranmoy.bhattacha...@fujitsu.com
> Hm. Any idea why that is? I wonder if the compiler isn't using as many > SVE registers as it could for this. Not sure, we tried forcing loop unrolling using the below line in the MakeFile but the results are the same. pg_popcount_sve.o: CFLAGS += ${CFLAGS_UNROLL_LOOPS} -march=native > I've

Re: [PATCH] SVE popcount support

2025-02-14 Thread Nathan Bossart
On Thu, Feb 06, 2025 at 10:33:35AM -0600, Nathan Bossart wrote: > On Thu, Feb 06, 2025 at 08:44:35AM +, chiranmoy.bhattacha...@fujitsu.com > wrote: >>> Does this hand-rolled loop unrolling offer any particular advantage? What >>> do the numbers look like if we don't do this or if we process,

Re: [PATCH] SVE popcount support

2025-02-06 Thread Nathan Bossart
On Thu, Feb 06, 2025 at 08:44:35AM +, chiranmoy.bhattacha...@fujitsu.com wrote: >> Does this hand-rolled loop unrolling offer any particular advantage? What >> do the numbers look like if we don't do this or if we process, say, 4 >> vectors at a time? > > The unrolled version performs better

Re: [PATCH] SVE popcount support

2025-02-06 Thread chiranmoy.bhattacha...@fujitsu.com
> Hm. These results are so similar that I'm tempted to suggest we just > remove the section of code dedicated to alignment. Is there any reason not > to do that? It seems that the double load overhead from unaligned memory access isn’t too taxing, even on larger inputs. We can remove it to simpl

Re: [PATCH] SVE popcount support

2025-02-05 Thread Nathan Bossart
On Tue, Feb 04, 2025 at 09:01:33AM +, chiranmoy.bhattacha...@fujitsu.com wrote: >> +/* >> + * For smaller inputs, aligning the buffer degrades the performance. >> + * Therefore, the buffers only when the input size is sufficiently >> large. >> + */ > >> Is the inverse true, i

Re: [PATCH] SVE popcount support

2025-02-04 Thread chiranmoy.bhattacha...@fujitsu.com
> The meson configure check seems to fail on my machine > This test looks quite different than the autoconf one. Why is that? I would expect them to be the same. And I think ideally the test would check that all the intrinsics functions we need are available. Fixed, both meson and autoconf have

Re: [PATCH] SVE popcount support

2025-01-24 Thread Nathan Bossart
The meson configure check seems to fail on my machine: error: too many arguments to function call, expected 0, have 1 10 | svuint64_t popcnt = svcntb(val); | ~~ ^~~ error: returning '__SVInt64_t' from a function with incompa

Re: [PATCH] SVE popcount support

2025-01-22 Thread Nathan Bossart
On Wed, Jan 22, 2025 at 11:04:22AM +, chiranmoy.bhattacha...@fujitsu.com wrote: > If there is no further feedback from the community, may we submit the > patch for the next commit fest? I would encourage you to create a commitfest entry so that it is picked up by our automated patch testing t

Re: [PATCH] SVE popcount support

2025-01-22 Thread chiranmoy.bhattacha...@fujitsu.com
> This looks good. Thanks Chiranmoy and team. Can you address any other > feedback from Nathan or others here? Then we can pursue further reviews and > merging of the patch. Thank you for the review. If there is no further feedback from the community, may we submit the patch for the next commit

Re: [PATCH] SVE popcount support

2025-01-13 Thread Malladi, Rama
Here is the updated patch using pg_attribute_target("arch=armv8-a+sve") to compile the arch-specific function instead of using compiler flags. --- This looks good. Thanks Chiranmoy and team. Can you address any other feedback from Nathan or others here? Then we can pursue further reviews an

Re: [PATCH] SVE popcount support

2025-01-10 Thread chiranmoy.bhattacha...@fujitsu.com
Hi all, Here is the updated patch using pg_attribute_target("arch=armv8-a+sve") to compile the arch-specific function instead of using compiler flags. --- Chiranmoy v3-0001-SVE-support-for-popcount-and-popcount-masked.patch Description: v3-0001-SVE-support-for-popcount-and-popcount-masked.p

Re: [PATCH] SVE popcount support

2024-12-11 Thread chiranmoy.bhattacha...@fujitsu.com
dboss...@gmail.com> Sent: Wednesday, December 4, 2024 21:37 To: Malladi, Rama <mailto:ramamall...@hotmail.com> Cc: Kirill Reshke <mailto:reshkekir...@gmail.com>; pgsql-hackers <mailto:pgsql-hack...@postgresql.org> Subject: Re: [PATCH] SVE popcount support On Wed, Dec 04, 2024 at

Re: [PATCH] SVE popcount support

2024-12-09 Thread Malladi, Rama
Malladi, Rama *Cc:* Kirill Reshke ; pgsql-hackers *Subject:* Re: [PATCH] SVE popcount support On Wed, Dec 04, 2024 at 08:51:39AM -0600, Malladi, Rama wrote: > Thank you, Kirill, for the review and the feedback. Please find inline my > reply and an updated patch. Thanks for the updated

Re: [PATCH] SVE popcount support

2024-12-04 Thread Nathan Bossart
On Wed, Dec 04, 2024 at 08:51:39AM -0600, Malladi, Rama wrote: > Thank you, Kirill, for the review and the feedback. Please find inline my > reply and an updated patch. Thanks for the updated patch. I have a couple of high-level comments. Would you mind adding this to the commitfest system (https

Re: [PATCH] SVE popcount support

2024-11-29 Thread Bruce Momjian
On Wed, Nov 27, 2024 at 03:43:27PM +, Malladi, Rama wrote: > • Attachments protected by Amazon: > • 0001-SVE-popcount-support.patch | > • SVE-popcount-support-PostgreSQL.png | > > Amazon has replaced the attachments in this email with download links. > Downloads will be available until D

Re: [PATCH] SVE popcount support

2024-11-29 Thread Kirill Reshke
On Thu, 28 Nov 2024 at 20:22, Malladi, Rama wrote: > > Attachments protected by Amazon: 0001-SVE-popcount-support.patch | SVE-popcount-support-PostgreSQL.png | > Amazon has replaced the attachments in this email with download links. Downloads will be available until December 27, 2024, 15:43 (UTC+0

Re: [PATCH] SVE popcount support

2024-11-28 Thread Kirill Reshke
On Thu, 28 Nov 2024 at 20:22, Malladi, Rama wrote: > > Attachments protected by Amazon: 0001-SVE-popcount-support.patch | > SVE-popcount-support-PostgreSQL.png | > Amazon has replaced the attachments in this email with download links. > Downloads will be available until December 27, 2024, 15:43