Re: [PATCH] SVE popcount support

2025-03-28 Thread Nathan Bossart
Committed. On Fri, Mar 28, 2025 at 10:25:26AM -0500, Nathan Bossart wrote: > On Thu, Mar 27, 2025 at 03:31:27PM +0700, John Naylor wrote: >> On Thu, Mar 27, 2025 at 10:38 AM Nathan Bossart >> wrote: >>> I also noticed a silly mistake in 0003 that would cause us to potentially >>> skip part of the

Re: [PATCH] SVE popcount support

2025-03-28 Thread Nathan Bossart
On Thu, Mar 27, 2025 at 03:31:27PM +0700, John Naylor wrote: > On Thu, Mar 27, 2025 at 10:38 AM Nathan Bossart > wrote: >> I also noticed a silly mistake in 0003 that would cause us to potentially >> skip part of the tail. That should be fixed now. > > I'm not sure whether that meant it could re

Re: [PATCH] SVE popcount support

2025-03-27 Thread John Naylor
On Thu, Mar 27, 2025 at 10:38 AM Nathan Bossart wrote: > I also noticed a silly mistake in 0003 that would cause us to potentially > skip part of the tail. That should be fixed now. I'm not sure whether that meant it could return the wrong answer, or just make more work for paths further down. I

Re: [PATCH] SVE popcount support

2025-03-26 Thread Nathan Bossart
/* USE_AVX512_POPCNT_WITH_RUNTIME_CHECK */ -- 2.39.5 (Apple Git-154) >From ee81eded16a5b7987b0fdf180f6a411bef2810b6 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Mon, 24 Mar 2025 20:10:23 -0500 Subject: [PATCH v10 2/3] Add Neon popcount support. This commit introduces

Re: [PATCH] SVE popcount support

2025-03-26 Thread Nathan Bossart
/* TRY_POPCNT_FAST */ +#endif /* TRY_POPCNT_X86_64 */ #endif /* USE_AVX512_POPCNT_WITH_RUNTIME_CHECK */ -- 2.39.5 (Apple Git-154) >From 5953da8e6c4d167954cbedfca58bd7558feb8620 Mon Sep 17 00:00:00

Re: [PATCH] SVE popcount support

2025-03-24 Thread Nathan Bossart
On Mon, Mar 24, 2025 at 06:34:45PM +0700, John Naylor wrote: > On Sat, Mar 22, 2025 at 10:42 AM Nathan Bossart > wrote: >> * 0002 introduces the Neon implementation, which conveniently doesn't need >> configure-time checks or function pointers. I noticed that some >> compilers (e.g., Apple cl

Re: [PATCH] SVE popcount support

2025-03-24 Thread John Naylor
On Sat, Mar 22, 2025 at 10:42 AM Nathan Bossart wrote: > * 0002 introduces the Neon implementation, which conveniently doesn't need > configure-time checks or function pointers. I noticed that some > compilers (e.g., Apple clang 16) compile in Neon instructions already, > but our hand-roll

Re: [PATCH] SVE popcount support

2025-03-23 Thread chiranmoy.bhattacha...@fujitsu.com
Looks good, the code is more readable now. > For both Neon and SVE, I do see improvements with looping over 4 > registers at a time, so IMHO it's worth doing so even if it performs the > same as 2-register blocks on some hardware. There was no regression on Graviton 3 when using the 4-register

Re: [PATCH] SVE popcount support

2025-03-21 Thread Nathan Bossart
TRY_POPCNT_FAST */ +#endif /* POPCNT_X86_64 */ #endif /* USE_AVX512_POPCNT_WITH_RUNTIME_CHECK */ -- 2.39.5 (Apple Git-154) >From 3ebc1321e6782919980d3410d3bc527fd77751fc Mon Sep 17 00:00:00 2001

Re: [PATCH] SVE popcount support

2025-03-19 Thread chiranmoy.bhattacha...@fujitsu.com
On Wed, Mar 13, 2025 at 12:02:07AM +, nathandboss...@gmail.com wrote: > Those are nice results. I'm a little worried about the Neon implementation > for smaller inputs since it uses a per-byte loop for the remaining bytes, > though. If we can ensure there's no regression there, I think this p

Re: remove open-coded popcount in acl.c

2025-03-12 Thread Nathan Bossart
On Wed, Mar 12, 2025 at 01:35:39PM -0500, Nathan Bossart wrote: > Thanks for the quick review. I'll plan on committing this shortly if CI is > happy. Committed. -- nathan

Re: remove open-coded popcount in acl.c

2025-03-12 Thread Álvaro Herrera
On 2025-Mar-12, Nathan Bossart wrote: > On Wed, Mar 12, 2025 at 05:23:25PM +0100, Álvaro Herrera wrote: > > Strange: this code is not covered by any tests. > > > > https://coverage.postgresql.org/src/backend/utils/adt/acl.c.gcov.html#5533 > > https://coverage.postgresql.org/src/backend/utils/adt/

Re: remove open-coded popcount in acl.c

2025-03-12 Thread Nathan Bossart
On Wed, Mar 12, 2025 at 07:34:16PM +0100, Álvaro Herrera wrote: > Thanks :-) I confirm that this covers the code in select_best_grantor > that you're modifying. Thanks for the quick review. I'll plan on committing this shortly if CI is happy. -- nathan

Re: remove open-coded popcount in acl.c

2025-03-12 Thread Nathan Bossart
l, it's easy enough to add some basic tests for the grantor selection machinery. Here's a first try. -- nathan >From d3cf9ca237f647ebcca20c55c8302f00f716c459 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Wed, 12 Mar 2025 10:45:12 -0500 Subject: [PATCH v2 1/1] Remove open-coded pop

Re: [PATCH] SVE popcount support

2025-03-12 Thread Nathan Bossart
On Wed, Mar 12, 2025 at 10:34:46AM +, chiranmoy.bhattacha...@fujitsu.com wrote: > On Wed, Mar 12, 2025 at 02:41:18AM +, nathandboss...@gmail.com wrote: > >> v5-no-sve is the result of using a function pointer, but pointing to the >> "slow" versions instead of the SVE version. v5-sve is t

Re: remove open-coded popcount in acl.c

2025-03-12 Thread Álvaro Herrera
On 2025-Mar-12, Nathan Bossart wrote: > There's a count_one_bits() function in acl.c that can be replaced with a > call to pg_popcount64(). This isn't performance-critical code, but IMHO we > might as well use the centralized implementation. Makes sense. Patch looks good to me. > @@ -5532,7 +5

remove open-coded popcount in acl.c

2025-03-12 Thread Nathan Bossart
From: Nathan Bossart Date: Wed, 12 Mar 2025 10:45:12 -0500 Subject: [PATCH v1 1/1] Remove open-coded popcount in acl.c. --- src/backend/utils/adt/acl.c | 20 +--- 1 file changed, 1 insertion(+), 19 deletions(-) diff --git a/src/backend/utils/adt/acl.c b/src/backend/utils/ad

Re: [PATCH] SVE popcount support

2025-03-12 Thread chiranmoy.bhattacha...@fujitsu.com
On Wed, Mar 12, 2025 at 02:41:18AM +, nathandboss...@gmail.com wrote: > v5-no-sve is the result of using a function pointer, but pointing to the > "slow" versions instead of the SVE version. v5-sve is the result of the > latest patch in this thread on a machine with SVE support, and v5-4reg i

Re: [PATCH] SVE popcount support

2025-03-11 Thread Nathan Bossart
On Fri, Mar 07, 2025 at 03:20:07AM +, chiranmoy.bhattacha...@fujitsu.com wrote: > Sounds good. Let us know your findings. Alright, here's what I saw on an R8g for drive_popcount(100, N): 8-byte words master v5-no-svev5-sve v5-4reg 1 2.540 ms 2.170

Re: [PATCH] SVE popcount support

2025-03-06 Thread chiranmoy.bhattacha...@fujitsu.com
> Interesting. I do see different assembly with the 2 and 4 register > versions, but I didn't get to testing it on a machine with SVE support > today. > Besides some additional benchmarking, I might make some small adjustments > to the patch. But overall, it seems to be in decent shape. Sounds

Re: [PATCH] SVE popcount support

2025-03-03 Thread Nathan Bossart
On Wed, Feb 19, 2025 at 09:31:50AM +, chiranmoy.bhattacha...@fujitsu.com wrote: >> Hm. Any idea why that is? I wonder if the compiler isn't using as many >> SVE registers as it could for this. > > Not sure, we tried forcing loop unrolling using the below line in the MakeFile > but the resul

Re: [PATCH] SVE popcount support

2025-02-19 Thread chiranmoy.bhattacha...@fujitsu.com
oose" logic guarded by TRY_POPCNT_FAST. The latest patch bypasses TRY_POPCNT_FAST by having a separate choose logic for aarch64. -Chiranmoy v5-0001-SVE-support-for-popcount-and-popcount-masked.patch Description: v5-0001-SVE-support-for-popcount-and-popcount-masked.patch

Re: [PATCH] SVE popcount support

2025-02-14 Thread Nathan Bossart
On Thu, Feb 06, 2025 at 10:33:35AM -0600, Nathan Bossart wrote: > On Thu, Feb 06, 2025 at 08:44:35AM +, chiranmoy.bhattacha...@fujitsu.com > wrote: >>> Does this hand-rolled loop unrolling offer any particular advantage? What >>> do the numbers look like if we don't do this or if we process,

Re: [PATCH] SVE popcount support

2025-02-06 Thread Nathan Bossart
On Thu, Feb 06, 2025 at 08:44:35AM +, chiranmoy.bhattacha...@fujitsu.com wrote: >> Does this hand-rolled loop unrolling offer any particular advantage? What >> do the numbers look like if we don't do this or if we process, say, 4 >> vectors at a time? > > The unrolled version performs better

Re: [PATCH] SVE popcount support

2025-02-06 Thread chiranmoy.bhattacha...@fujitsu.com
> Hm. These results are so similar that I'm tempted to suggest we just > remove the section of code dedicated to alignment. Is there any reason not > to do that? It seems that the double load overhead from unaligned memory access isn’t too taxing, even on larger inputs. We can remove it to simpl

Re: [PATCH] SVE popcount support

2025-02-05 Thread Nathan Bossart
On Tue, Feb 04, 2025 at 09:01:33AM +, chiranmoy.bhattacha...@fujitsu.com wrote: >> +/* >> + * For smaller inputs, aligning the buffer degrades the performance. >> + * Therefore, the buffers only when the input size is sufficiently >> large. >> + */ > >> Is the inverse true, i

Re: [PATCH] SVE popcount support

2025-02-04 Thread chiranmoy.bhattacha...@fujitsu.com
45.897 | 41.890 256 | 62.440 | 63.454 | 58.666 512 | 100.120 | 102.767 | 99.861 1024 | 159.574 | 158.594 |164.975 2048 | 282.354 | 281.198 |283.937 4096 | 532.038 | 531.068 |53

Re: [PATCH] SVE popcount support

2025-01-24 Thread Nathan Bossart
The meson configure check seems to fail on my machine: error: too many arguments to function call, expected 0, have 1 10 | svuint64_t popcnt = svcntb(val); | ~~ ^~~ error: returning '__SVInt64_t' from a function with incompa

Re: [PATCH] SVE popcount support

2025-01-22 Thread Nathan Bossart
On Wed, Jan 22, 2025 at 11:04:22AM +, chiranmoy.bhattacha...@fujitsu.com wrote: > If there is no further feedback from the community, may we submit the > patch for the next commit fest? I would encourage you to create a commitfest entry so that it is picked up by our automated patch testing t

Re: [PATCH] SVE popcount support

2025-01-22 Thread chiranmoy.bhattacha...@fujitsu.com
> This looks good. Thanks Chiranmoy and team. Can you address any other > feedback from Nathan or others here? Then we can pursue further reviews and > merging of the patch. Thank you for the review. If there is no further feedback from the community, may we submit the patch for the next commit

Re: [PATCH] SVE popcount support

2025-01-13 Thread Malladi, Rama
Here is the updated patch using pg_attribute_target("arch=armv8-a+sve") to compile the arch-specific function instead of using compiler flags. --- This looks good. Thanks Chiranmoy and team. Can you address any other feedback from Nathan or others here? Then we can pursue further reviews an

Re: [PATCH] SVE popcount support

2025-01-10 Thread chiranmoy.bhattacha...@fujitsu.com
Hi all, Here is the updated patch using pg_attribute_target("arch=armv8-a+sve") to compile the arch-specific function instead of using compiler flags. --- Chiranmoy v3-0001-SVE-support-for-popcount-and-popcount-masked.patch Description: v3-0001-SVE-support-for-popcount-an

Re: [PATCH] SVE popcount support

2024-12-11 Thread chiranmoy.bhattacha...@fujitsu.com
ubject: Re: [PATCH] SVE popcount support On 12/9/24 12:21 AM, devanga.susmi...@fujitsu.com<mailto:devanga.susmi...@fujitsu.com> wrote: Hello, We are sharing our patch for pg_popcount with SVE support as a contribution from our side in this thread. We hope this contribution will help i

Re: [PATCH] SVE popcount support

2024-12-09 Thread Malladi, Rama
On 12/9/24 12:21 AM, devanga.susmi...@fujitsu.com wrote: Hello, We are sharing our patch for pg_popcount with SVE support as a contribution from our side in this thread. We hope this contribution will help in exploring and refining the popcount implementation further. Our patch uses the

Re: Popcount optimization using SVE for ARM

2024-12-09 Thread devanga.susmi...@fujitsu.com
: Susmitha, Devanga ; pgsql-hackers ; Hajela, Ragesh ; Bhattacharya, Chiranmoy ; M A, Rajat Subject: Re: Popcount optimization using SVE for ARM I suggest we move this discussion to the existing thread on this subject: https://www.postgresql.org/message-id/flat/010101936e4aaa70-b474ab9e

Re: Popcount optimization using SVE for ARM

2024-12-06 Thread Nathan Bossart
I suggest we move this discussion to the existing thread on this subject: https://www.postgresql.org/message-id/flat/010101936e4aaa70-b474ab9e-b9ce-474d-a3ba-a3dc223d295c-00%40us-west-2.amazonses.com -- nathan

Re: Popcount optimization using SVE for ARM

2024-12-06 Thread Kirill Reshke
I did not yet look into this in detail, but please note that PostgreSQL comments style is /**/ not //. Also, please, do not top post on this list

Re: Popcount optimization using SVE for ARM

2024-12-06 Thread devanga.susmi...@fujitsu.com
method, to determine the correct popcount implementation based on the architecture, thereby requiring fewer code changes. The patch also includes implementations for popcount32, popcount64 and popcount masked. We'd be happy to discuss any potential overlaps and collaborate further to

Re: Popcount optimization using SVE for ARM

2024-12-05 Thread Kirill Reshke
On Fri, 6 Dec 2024 at 10:54, devanga.susmi...@fujitsu.com < devanga.susmi...@fujitsu.com> wrote: > Hello, This email is to discuss the contribution of the speed-up > popcount and popcount mask feature we have developed for the ARM > architecture using SVE intrinsics. > The

Re: [PATCH] SVE popcount support

2024-12-04 Thread Nathan Bossart
system (https://commitfest.postgresql.org/) so that it is picked up by our automated patch testing tools? > +# Check for ARMv8 SVE popcount intrinsics > +# > +CFLAGS_POPCNT="" > +PG_POPCNT_OBJS="" > +PGAC_SVE_POPCNT_INTRINSICS([]) > +if test x&

Re: [PATCH] SVE popcount support

2024-11-29 Thread Bruce Momjian
On Wed, Nov 27, 2024 at 03:43:27PM +, Malladi, Rama wrote: > • Attachments protected by Amazon: > • 0001-SVE-popcount-support.patch | > • SVE-popcount-support-PostgreSQL.png | > > Amazon has replaced the attachments in this email with download links. > Downloads will

Re: [PATCH] SVE popcount support

2024-11-29 Thread Kirill Reshke
On Thu, 28 Nov 2024 at 20:22, Malladi, Rama wrote: > > Attachments protected by Amazon: 0001-SVE-popcount-support.patch | SVE-popcount-support-PostgreSQL.png | > Amazon has replaced the attachments in this email with download links. Downloads will be available until December 27, 2024, 1

Re: [PATCH] SVE popcount support

2024-11-28 Thread Kirill Reshke
On Thu, 28 Nov 2024 at 20:22, Malladi, Rama wrote: > > Attachments protected by Amazon: 0001-SVE-popcount-support.patch | > SVE-popcount-support-PostgreSQL.png | > Amazon has replaced the attachments in this email with download links. > Downloads will be available until December

[PATCH] SVE popcount support

2024-11-28 Thread Malladi, Rama
Attachments protected by Amazon: [0001-SVE-popcount-support.patch] https://us-west-2.secure-attach.amazon.com/a29c9ff9-1f9b-430f-9b3c-07fde9a419aa/f9178627-0600-4527-bc5c-7e4cb9ef6e9a [SVE-popcount-support-PostgreSQL.png] https://us-west-2.secure-attach.amazon.com/a29c9ff9-1f9b-430f-9b3c

Re: Popcount optimization using AVX512

2024-11-07 Thread Nathan Bossart
On Thu, Nov 07, 2024 at 08:38:21PM +, Devulapalli, Raghuveer wrote: > >> Of course, as soon as I committed this, I noticed that it's broken. It >> seems that >> compilers are rather picky about how multiple target options are specified. > > Just curious, which compiler complained? Clang.

RE: Popcount optimization using AVX512

2024-11-07 Thread Devulapalli, Raghuveer
> Of course, as soon as I committed this, I noticed that it's broken. It seems > that > compilers are rather picky about how multiple target options are specified. Just curious, which compiler complained? Raghuveer

Re: Popcount optimization using AVX512

2024-11-07 Thread Nathan Bossart
On Thu, Nov 07, 2024 at 02:03:04PM -0600, Nathan Bossart wrote: > Committed. Of course, as soon as I committed this, I noticed that it's broken. It seems that compilers are rather picky about how multiple target options are specified. AFAICT the commonly supported syntax is to put the entire lis

Re: Popcount optimization using AVX512

2024-11-07 Thread Nathan Bossart
Committed. -- nathan

Re: Popcount optimization using AVX512

2024-11-07 Thread Nathan Bossart
On Thu, Nov 07, 2024 at 11:12:37AM -0500, Andres Freund wrote: > One thing that'd I'd like to see this being used is to elide the indirection > when the current target platform *already* supports the necessary > intrinsics. Adding a bunch of indirection for short & common operations is > decidedly

Re: Popcount optimization using AVX512

2024-11-07 Thread Andres Freund
Hi, On 2024-11-06 20:26:47 -0600, Nathan Bossart wrote: > From d0fb7e0e375f7b76d4df90910c21e9448dd3b380 Mon Sep 17 00:00:00 2001 > From: Nathan Bossart > Date: Wed, 16 Oct 2024 15:57:55 -0500 > Subject: [PATCH v3 1/1] use __attribute__((target(...))) for AVX-512 stuff One thing that'd I'd like t

Re: Popcount optimization using AVX512

2024-11-06 Thread Nathan Bossart
conftest$ac_exeext conftest.$ac_ext -CFLAGS="$pgac_save_CFLAGS" fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_xsave_intrinsics__mxsave" >&5 -$as_echo "$pgac_cv_xsave_intrinsics__mxsave" >&6; } -if test x"$pgac_cv_xsave_intri

Re: Popcount optimization using AVX512

2024-11-04 Thread Nathan Bossart
On Thu, Oct 31, 2024 at 07:58:06PM +, Devulapalli, Raghuveer wrote: > LGTM. Thanks. Barring additional feedback, I plan to commit this soon. -- nathan

Re: Popcount optimization using AVX512

2024-10-31 Thread Nathan Bossart
$LINENO}: result: $pgac_cv_xsave_intrinsics__mxsave" >&5 -$as_echo "$pgac_cv_xsave_intrinsics__mxsave" >&6; } -if test x"$pgac_cv_xsave_intrinsics__mxsave" = x"yes"; then - CFLAGS_XSAVE="-mxsave" +{ $as_echo "$as_

RE: Popcount optimization using AVX512

2024-10-31 Thread Devulapalli, Raghuveer
> Here is an updated patch with this change. LGTM. Raghuveer

RE: Popcount optimization using AVX512

2024-10-30 Thread Devulapalli, Raghuveer
> Oh, good catch. IIUC we only need to check for #ifndef _MSC_VER in the > configure programs for meson. pg_attribute_target will be empty on MSVC, and > I > believe we only support meson builds there. Right. __has_attribute (target) produces a compiler warning on MSVC: https://gcc.godbolt.o

Re: Popcount optimization using AVX512

2024-10-30 Thread Nathan Bossart
On Wed, Oct 30, 2024 at 08:53:10PM +, Raghuveer Devulapalli wrote: > BTW, I just realized function attributes for xsave and avx512 don't work > on MSVC (see > https://developercommunity.visualstudio.com/t/support-function-target-attribute-and-mutiversioning/10130630). > Not sure if you care abo

Re: Popcount optimization using AVX512

2024-10-30 Thread Raghuveer Devulapalli
BTW, I just realized function attributes for xsave and avx512 don't work on MSVC (see https://developercommunity.visualstudio.com/t/support-function-target-attribute-and-mutiversioning/10130630). Not sure if you care about it. Its an easy fix (see https://gcc.godbolt.org/z/Pebdj3vMx).

Re: Popcount optimization using AVX512

2024-10-29 Thread Raghuveer Devulapalli
The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: tested, failed Spec compliant: tested, failed Documentation:tested, failed Changes LGTM. Makes the Makefile look clean. Built and ran te

Re: Popcount optimization using AVX512

2024-10-16 Thread Nathan Bossart
y_link "$LINENO"; then : - pgac_cv_xsave_intrinsics__mxsave=yes + pgac_cv_xsave_intrinsics=yes else - pgac_cv_xsave_intrinsics__mxsave=no + pgac_cv_xsave_intrinsics=no fi rm -f core conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext -CFLAGS="$pgac_save_

Re: Popcount optimization using AVX512

2024-10-08 Thread Nathan Bossart
On Wed, Jul 31, 2024 at 04:43:02PM -0500, Nathan Bossart wrote: > On Wed, Jul 31, 2024 at 01:52:54PM -0700, Andres Freund wrote: >> I think we'd be better off enabling architectural features on a per-function >> basis, roughly like this: >> >> [...] >> >> /* FIXME: Should be gated by configure che

Re: Popcount optimization using AVX512

2024-07-31 Thread Nathan Bossart
On Wed, Jul 31, 2024 at 01:52:54PM -0700, Andres Freund wrote: > On 2024-07-30 22:12:18 -0500, Nathan Bossart wrote: >> As I started on this, I remembered why I needed it. The file >> pg_popcount_avx512_choose.c is compiled without the AVX-512 flags in order >> to avoid inadvertently issuing any A

Re: Popcount optimization using AVX512

2024-07-31 Thread Andres Freund
Hi, On 2024-07-30 22:12:18 -0500, Nathan Bossart wrote: > On Tue, Jul 30, 2024 at 10:01:50PM -0500, Nathan Bossart wrote: > > On Tue, Jul 30, 2024 at 07:43:08PM -0700, Andres Freund wrote: > >> My point is that _xgetbv() is made available by -mavx512vpopcntdq > >> -mavx512bw > >> alone, without n

Re: Popcount optimization using AVX512

2024-07-30 Thread Nathan Bossart
On Tue, Jul 30, 2024 at 10:01:50PM -0500, Nathan Bossart wrote: > On Tue, Jul 30, 2024 at 07:43:08PM -0700, Andres Freund wrote: >> My point is that _xgetbv() is made available by -mavx512vpopcntdq -mavx512bw >> alone, without needing -mxsave: > > Oh, I see. I'll work on a patch to remove that co

Re: Popcount optimization using AVX512

2024-07-30 Thread Nathan Bossart
On Tue, Jul 30, 2024 at 07:43:08PM -0700, Andres Freund wrote: > On 2024-07-30 21:01:31 -0500, Nathan Bossart wrote: >> The main purpose of the XSAVE compiler check is to determine whether we >> need to add -mxsave in order to use _xgetbv() [0]. If that wasn't a >> factor, we could probably skip i

Re: Popcount optimization using AVX512

2024-07-30 Thread Andres Freund
Hi, On 2024-07-30 21:01:31 -0500, Nathan Bossart wrote: > On Tue, Jul 30, 2024 at 06:46:51PM -0700, Andres Freund wrote: > > On 2024-07-30 20:20:34 -0500, Nathan Bossart wrote: > >> On Tue, Jul 30, 2024 at 05:49:59PM -0700, Andres Freund wrote: > >> > Why are we actually checking for xsave? We're

Re: Popcount optimization using AVX512

2024-07-30 Thread Nathan Bossart
On Tue, Jul 30, 2024 at 06:46:51PM -0700, Andres Freund wrote: > On 2024-07-30 20:20:34 -0500, Nathan Bossart wrote: >> On Tue, Jul 30, 2024 at 05:49:59PM -0700, Andres Freund wrote: >> > Why are we actually checking for xsave? We're not using xsave itself and I >> > couldn't find a comment in 7927

Re: Popcount optimization using AVX512

2024-07-30 Thread Andres Freund
Hi, On 2024-07-30 20:20:34 -0500, Nathan Bossart wrote: > On Tue, Jul 30, 2024 at 05:49:59PM -0700, Andres Freund wrote: > > Ah, I somehow thought we'd avoid the runtime check in case we determine at > > compile time we don't need any extra flags to enable the AVX512 stuff > > (similar > > to how

Re: Popcount optimization using AVX512

2024-07-30 Thread Nathan Bossart
On Tue, Jul 30, 2024 at 05:49:59PM -0700, Andres Freund wrote: > Ah, I somehow thought we'd avoid the runtime check in case we determine at > compile time we don't need any extra flags to enable the AVX512 stuff (similar > to how we deal with crc32). But it looks like that's not the case - which >

Re: Popcount optimization using AVX512

2024-07-30 Thread Thomas Munro
On Wed, Jul 31, 2024 at 12:50 PM Andres Freund wrote: > It's one thing for the avx512 path to have that overhead, but it's > particularly absurd for pg_popcount32/pg_popcount64, where > > a) The function call overhead is a larger proportion of the cost. > b) the instruction is almost universally a

Re: Popcount optimization using AVX512

2024-07-30 Thread Andres Freund
Hi, On 2024-07-30 16:32:07 -0500, Nathan Bossart wrote: > On Tue, Jul 30, 2024 at 02:07:01PM -0700, Andres Freund wrote: > > Now, a reasonable counter-argument would be that only some of these macros > > are > > defined for msvc ([1]). However, as it turns out, the test is broken > > today, as m

Re: Popcount optimization using AVX512

2024-07-30 Thread Nathan Bossart
On Tue, Jul 30, 2024 at 04:32:07PM -0500, Nathan Bossart wrote: > On Tue, Jul 30, 2024 at 02:07:01PM -0700, Andres Freund wrote: >> Afaict we could just check for predefined preprocessor macros: >> >> echo|time gcc -c -mxsave -mavx512vpopcntdq -mavx512bw -xc -dM -E - -o >> -|grep -E '__XSAVE__|_

Re: Popcount optimization using AVX512

2024-07-30 Thread Nathan Bossart
ES > [7.481] Checking if "XSAVE intrinsics without -mxsave" : links: NO > [8.097] Checking if "XSAVE intrinsics with -mxsave" : links: YES > [8.641] Checking if "AVX-512 popcount without -mavx512vpopcntdq -mavx512bw" : > links: NO > [9.183] Checking

Re: Popcount optimization using AVX512

2024-07-30 Thread Andres Freund
hecking if " __atomic_compare_exchange_n(int32)" : links: YES [6.940] Checking if " __atomic_compare_exchange_n(int64)" : links: YES [7.481] Checking if "XSAVE intrinsics without -mxsave" : links: NO [8.097] Checking if "XSAVE intrinsics with -mxsave" : links: YE

Re: Popcount optimization using AVX512

2024-04-23 Thread Nathan Bossart
On Thu, Apr 18, 2024 at 05:13:58PM -0500, Nathan Bossart wrote: > Makes sense, thanks. I'm planning to commit this fix sometime early next > week. Committed. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com

Re: Popcount optimization using AVX512

2024-04-18 Thread Nathan Bossart
On Thu, Apr 18, 2024 at 10:11:08PM +, Devulapalli, Raghuveer wrote: >> On that note, is it necessary to also check for avx512f? At the moment, >> we are assuming that's supported if the other AVX-512 instructions are >> available. > > No, it's not needed. There are no CPU's with avx512bw/avx5

RE: Popcount optimization using AVX512

2024-04-18 Thread Devulapalli, Raghuveer
> On that note, is it necessary to also check for avx512f? At the moment, we > are assuming that's supported if the other AVX-512 instructions are available. No, it's not needed. There are no CPU's with avx512bw/avx512popcnt without avx512f. Unfortunately though, avx512popcnt does not mean avx

Re: Popcount optimization using AVX512

2024-04-18 Thread Nathan Bossart
here's support for XSAVE instructions? */ -bool -pg_popcount_avx512_available(void) +static inline bool +xsave_available(void) { unsigned int exx[4] = {0, 0, 0, 0}; - /* Does CPUID say there's support for AVX-512 popcount instructions? */ -#if defined(HAVE__GET_CPUID_COUNT) - __get_cpui

RE: Popcount optimization using AVX512

2024-04-18 Thread Devulapalli, Raghuveer
> Thanks for the feedback. I've attached an updated patch. (1) Shouldn't it be: return (_xgetbv(0) & 0xe6) == 0xe6; ? Otherwise zmm_regs_available() will return false. (2) Nitpick: avx512_popcnt_available and avx512_bw_available() run the same cpuid leaf. You could combine them into one to avo

Re: Popcount optimization using AVX512

2024-04-18 Thread Nathan Bossart
equired for the AVX-512 - * pg_popcount() implementation. + * Does CPUID say there's support for XSAVE instructions? */ -bool -pg_popcount_avx512_available(void) +static inline bool +xsave_available(void) { unsigned int exx[4] = {0, 0, 0, 0}; - /* Does CPUID say there's support fo

Re: Popcount optimization using AVX512

2024-04-18 Thread Nathan Bossart
> (Step 3 can be done in any order relative to 1 and 2.)" Thanks for confirming. IIUC my patch should be sufficient, then. > It also seems that step 1 and step 2 need to be done prior to the CPUID > OSXSAVE check in the popcount code. This seems to contradict the note about doin

RE: Popcount optimization using AVX512

2024-04-18 Thread Shankaran, Akash
ons supported). (Step 3 can be done in any order relative to 1 and 2.)" It also seems that step 1 and step 2 need to be done prior to the CPUID OSXSAVE check in the popcount code. [0]: https://cdrdv2.intel.com/v1/dl/getContent/671200 - Akash Shankaran

Re: Popcount optimization using AVX512

2024-04-17 Thread Nathan Bossart
It was brought to my attention [0] that we probably should be checking for the OSXSAVE bit instead of the XSAVE bit when determining whether there's support for the XGETBV instruction. IIUC that should indicate that both the OS and the processor have XGETBV support (not just the processor). I've a

Re: Popcount optimization using AVX512

2024-04-07 Thread Tom Lane
Nathan Bossart writes: > On Sun, Apr 07, 2024 at 08:23:32PM -0500, Nathan Bossart wrote: >> The Intel documentation for _mm256_undefined_si256() [0] >> indicates that it is intended to return "undefined elements," so it seems >> like the use of an uninitialized variable might be intentional. > Se

Re: Popcount optimization using AVX512

2024-04-07 Thread Nathan Bossart
On Sun, Apr 07, 2024 at 08:23:32PM -0500, Nathan Bossart wrote: > The Intel documentation for _mm256_undefined_si256() [0] > indicates that it is intended to return "undefined elements," so it seems > like the use of an uninitialized variable might be intentional. See also https://gcc.gnu.org/git/

Re: Popcount optimization using AVX512

2024-04-07 Thread Nathan Bossart
On Sun, Apr 07, 2024 at 08:42:12PM -0400, Tom Lane wrote: > Today's Coverity run produced this warning, which seemingly was > triggered by one of these commits, but I can't make much sense > of it: > > *** CID 1596255: Uninitialized variables (UNINIT) > /usr/lib/gcc/x86_64-linux-gnu/10/include/a

Re: Popcount optimization using AVX512

2024-04-07 Thread Tom Lane
Nathan Bossart writes: > Here is what I have staged for commit, which I intend to do shortly. Today's Coverity run produced this warning, which seemingly was triggered by one of these commits, but I can't make much sense of it: *** CID 1596255: Uninitialized variables (UNINIT) /usr/lib/gcc/x86

Re: Popcount optimization using AVX512

2024-04-06 Thread Nathan Bossart
On Sat, Apr 06, 2024 at 02:41:01PM -0500, Nathan Bossart wrote: > Here is what I have staged for commit, which I intend to do shortly. Committed. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com

Re: Popcount optimization using AVX512

2024-04-06 Thread Nathan Bossart
{pgac_cv_xsave_intrinsics__mxsave+:} false; then : + $as_echo_n "(cached) " >&6 +else + pgac_save_CFLAGS=$CFLAGS +CFLAGS="$pgac_save_CFLAGS -mxsave" +cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ +#include +int +main () +{ +return _xgetbv(0) &am

Re: Popcount optimization using AVX512

2024-04-05 Thread David Rowley
On Sat, 6 Apr 2024 at 14:17, Nathan Bossart wrote: > > On Sat, Apr 06, 2024 at 12:08:14PM +1300, David Rowley wrote: > > Won't Valgrind complain about this? > > > > +pg_popcount_avx512(const char *buf, int bytes) > > > > + buf = (const char *) TYPEALIGN_DOWN(sizeof(__m512i), buf); > > > > + val =

Re: Popcount optimization using AVX512

2024-04-05 Thread Nathan Bossart
On Sat, Apr 06, 2024 at 12:08:14PM +1300, David Rowley wrote: > Won't Valgrind complain about this? > > +pg_popcount_avx512(const char *buf, int bytes) > > + buf = (const char *) TYPEALIGN_DOWN(sizeof(__m512i), buf); > > + val = _mm512_maskz_loadu_epi8(mask, (const __m512i *) buf); I haven't be

Re: Popcount optimization using AVX512

2024-04-05 Thread David Rowley
On Sat, 6 Apr 2024 at 04:38, Nathan Bossart wrote: > This seems to provide a small performance boost, so I've incorporated it > into v27. Won't Valgrind complain about this? +pg_popcount_avx512(const char *buf, int bytes) + buf = (const char *) TYPEALIGN_DOWN(sizeof(__m512i), buf); + val = _mm

Re: Popcount optimization using AVX512

2024-04-05 Thread Nathan Bossart
Bossart Amazon Web Services: https://aws.amazon.com >From 9fc4b7556b72d51fce676db84b446099767efff3 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Wed, 27 Mar 2024 16:39:24 -0500 Subject: [PATCH v27 1/2] AVX512 popcount support --- config/c-compiler.m4 | 58 ++ configu

Re: Popcount optimization using AVX512

2024-04-05 Thread Nathan Bossart
On Fri, Apr 05, 2024 at 10:33:27AM +0300, Ants Aasma wrote: > The main issue I saw was that clang was able to peel off the first > iteration of the loop and then eliminate the mask assignment and > replace masked load with a memory operand for vpopcnt. I was not able > to convince gcc to do that re

Re: Popcount optimization using AVX512

2024-04-05 Thread Ants Aasma
On Fri, 5 Apr 2024 at 07:15, Nathan Bossart wrote: > Here is an updated patch set. IMHO this is in decent shape and is > approaching committable. I checked the code generation on various gcc and clang versions. It looks mostly fine starting from versions where avx512 is supported, gcc-7.1 and cl

Re: Popcount optimization using AVX512

2024-04-04 Thread Nathan Bossart
ect: [PATCH v26 1/2] AVX512 popcount support --- config/c-compiler.m4 | 58 ++ configure| 252 +++ configure.ac | 51 ++ meson.build | 87 + src/Makefile.global

Re: Popcount optimization using AVX512

2024-04-04 Thread Nathan Bossart
On Thu, Apr 04, 2024 at 04:02:53PM +0300, Ants Aasma wrote: > Speaking of which, what does bumping up the inlined version threshold > to 16 do with and without AVX-512 available? Linearly extrapolating > the 2 and 4 byte numbers it might just come ahead in both cases, > making the choice easy. IIR

Re: Popcount optimization using AVX512

2024-04-04 Thread Nathan Bossart
On Thu, Apr 04, 2024 at 04:28:58PM +1300, David Rowley wrote: > On Thu, 4 Apr 2024 at 11:50, Nathan Bossart wrote: >> If we can verify this approach won't cause segfaults and can stomach the >> regression between 8 and 16 bytes, I'd happily pivot to this approach so >> that we can avoid the functi

Re: Popcount optimization using AVX512

2024-04-04 Thread Ants Aasma
On Thu, 4 Apr 2024 at 01:50, Nathan Bossart wrote: > If we can verify this approach won't cause segfaults and can stomach the > regression between 8 and 16 bytes, I'd happily pivot to this approach so > that we can avoid the function call dance that I have in v25. The approach I posted does not r

Re: Popcount optimization using AVX512

2024-04-03 Thread David Rowley
On Thu, 4 Apr 2024 at 11:50, Nathan Bossart wrote: > If we can verify this approach won't cause segfaults and can stomach the > regression between 8 and 16 bytes, I'd happily pivot to this approach so > that we can avoid the function call dance that I have in v25. > > Thoughts? If we're worried a

Re: Popcount optimization using AVX512

2024-04-03 Thread Nathan Bossart
On Tue, Apr 02, 2024 at 11:30:39PM +0300, Ants Aasma wrote: > On Tue, 2 Apr 2024 at 00:31, Nathan Bossart wrote: >> On Tue, Apr 02, 2024 at 12:11:59AM +0300, Ants Aasma wrote: >> > What about using the masking capabilities of AVX-512 to handle the >> > tail in the same code path? Masked out portio

  1   2   3   >