Committed.
On Fri, Mar 28, 2025 at 10:25:26AM -0500, Nathan Bossart wrote:
> On Thu, Mar 27, 2025 at 03:31:27PM +0700, John Naylor wrote:
>> On Thu, Mar 27, 2025 at 10:38 AM Nathan Bossart
>> wrote:
>>> I also noticed a silly mistake in 0003 that would cause us to potentially
>>> skip part of the
On Thu, Mar 27, 2025 at 03:31:27PM +0700, John Naylor wrote:
> On Thu, Mar 27, 2025 at 10:38 AM Nathan Bossart
> wrote:
>> I also noticed a silly mistake in 0003 that would cause us to potentially
>> skip part of the tail. That should be fixed now.
>
> I'm not sure whether that meant it could re
On Thu, Mar 27, 2025 at 10:38 AM Nathan Bossart
wrote:
> I also noticed a silly mistake in 0003 that would cause us to potentially
> skip part of the tail. That should be fixed now.
I'm not sure whether that meant it could return the wrong answer, or
just make more work for paths further down.
I
/*
USE_AVX512_POPCNT_WITH_RUNTIME_CHECK */
--
2.39.5 (Apple Git-154)
>From ee81eded16a5b7987b0fdf180f6a411bef2810b6 Mon Sep 17 00:00:00 2001
From: Nathan Bossart
Date: Mon, 24 Mar 2025 20:10:23 -0500
Subject: [PATCH v10 2/3] Add Neon popcount support.
This commit introduces
/* TRY_POPCNT_FAST */
+#endif /* TRY_POPCNT_X86_64 */
#endif /*
USE_AVX512_POPCNT_WITH_RUNTIME_CHECK */
--
2.39.5 (Apple Git-154)
>From 5953da8e6c4d167954cbedfca58bd7558feb8620 Mon Sep 17 00:00:00
On Mon, Mar 24, 2025 at 06:34:45PM +0700, John Naylor wrote:
> On Sat, Mar 22, 2025 at 10:42 AM Nathan Bossart
> wrote:
>> * 0002 introduces the Neon implementation, which conveniently doesn't need
>> configure-time checks or function pointers. I noticed that some
>> compilers (e.g., Apple cl
On Sat, Mar 22, 2025 at 10:42 AM Nathan Bossart
wrote:
> * 0002 introduces the Neon implementation, which conveniently doesn't need
> configure-time checks or function pointers. I noticed that some
> compilers (e.g., Apple clang 16) compile in Neon instructions already,
> but our hand-roll
Looks good, the code is more readable now.
> For both Neon and SVE, I do see improvements with looping over 4
> registers at a time, so IMHO it's worth doing so even if it performs the
> same as 2-register blocks on some hardware.
There was no regression on Graviton 3 when using the 4-register
TRY_POPCNT_FAST */
+#endif /* POPCNT_X86_64 */
#endif /*
USE_AVX512_POPCNT_WITH_RUNTIME_CHECK */
--
2.39.5 (Apple Git-154)
>From 3ebc1321e6782919980d3410d3bc527fd77751fc Mon Sep 17 00:00:00 2001
On Wed, Mar 13, 2025 at 12:02:07AM +, nathandboss...@gmail.com wrote:
> Those are nice results. I'm a little worried about the Neon implementation
> for smaller inputs since it uses a per-byte loop for the remaining bytes,
> though. If we can ensure there's no regression there, I think this p
On Wed, Mar 12, 2025 at 01:35:39PM -0500, Nathan Bossart wrote:
> Thanks for the quick review. I'll plan on committing this shortly if CI is
> happy.
Committed.
--
nathan
On 2025-Mar-12, Nathan Bossart wrote:
> On Wed, Mar 12, 2025 at 05:23:25PM +0100, Álvaro Herrera wrote:
> > Strange: this code is not covered by any tests.
> >
> > https://coverage.postgresql.org/src/backend/utils/adt/acl.c.gcov.html#5533
> > https://coverage.postgresql.org/src/backend/utils/adt/
On Wed, Mar 12, 2025 at 07:34:16PM +0100, Álvaro Herrera wrote:
> Thanks :-) I confirm that this covers the code in select_best_grantor
> that you're modifying.
Thanks for the quick review. I'll plan on committing this shortly if CI is
happy.
--
nathan
l, it's easy enough to add some basic tests for the grantor
selection machinery. Here's a first try.
--
nathan
>From d3cf9ca237f647ebcca20c55c8302f00f716c459 Mon Sep 17 00:00:00 2001
From: Nathan Bossart
Date: Wed, 12 Mar 2025 10:45:12 -0500
Subject: [PATCH v2 1/1] Remove open-coded pop
On Wed, Mar 12, 2025 at 10:34:46AM +, chiranmoy.bhattacha...@fujitsu.com
wrote:
> On Wed, Mar 12, 2025 at 02:41:18AM +, nathandboss...@gmail.com wrote:
>
>> v5-no-sve is the result of using a function pointer, but pointing to the
>> "slow" versions instead of the SVE version. v5-sve is t
On 2025-Mar-12, Nathan Bossart wrote:
> There's a count_one_bits() function in acl.c that can be replaced with a
> call to pg_popcount64(). This isn't performance-critical code, but IMHO we
> might as well use the centralized implementation.
Makes sense. Patch looks good to me.
> @@ -5532,7 +5
From: Nathan Bossart
Date: Wed, 12 Mar 2025 10:45:12 -0500
Subject: [PATCH v1 1/1] Remove open-coded popcount in acl.c.
---
src/backend/utils/adt/acl.c | 20 +---
1 file changed, 1 insertion(+), 19 deletions(-)
diff --git a/src/backend/utils/adt/acl.c b/src/backend/utils/ad
On Wed, Mar 12, 2025 at 02:41:18AM +, nathandboss...@gmail.com wrote:
> v5-no-sve is the result of using a function pointer, but pointing to the
> "slow" versions instead of the SVE version. v5-sve is the result of the
> latest patch in this thread on a machine with SVE support, and v5-4reg i
On Fri, Mar 07, 2025 at 03:20:07AM +, chiranmoy.bhattacha...@fujitsu.com
wrote:
> Sounds good. Let us know your findings.
Alright, here's what I saw on an R8g for drive_popcount(100, N):
8-byte words master v5-no-svev5-sve v5-4reg
1 2.540 ms 2.170
> Interesting. I do see different assembly with the 2 and 4 register
> versions, but I didn't get to testing it on a machine with SVE support
> today.
> Besides some additional benchmarking, I might make some small adjustments
> to the patch. But overall, it seems to be in decent shape.
Sounds
On Wed, Feb 19, 2025 at 09:31:50AM +, chiranmoy.bhattacha...@fujitsu.com
wrote:
>> Hm. Any idea why that is? I wonder if the compiler isn't using as many
>> SVE registers as it could for this.
>
> Not sure, we tried forcing loop unrolling using the below line in the MakeFile
> but the resul
oose" logic guarded by TRY_POPCNT_FAST.
The latest patch bypasses TRY_POPCNT_FAST by having a separate choose logic
for aarch64.
-Chiranmoy
v5-0001-SVE-support-for-popcount-and-popcount-masked.patch
Description: v5-0001-SVE-support-for-popcount-and-popcount-masked.patch
On Thu, Feb 06, 2025 at 10:33:35AM -0600, Nathan Bossart wrote:
> On Thu, Feb 06, 2025 at 08:44:35AM +, chiranmoy.bhattacha...@fujitsu.com
> wrote:
>>> Does this hand-rolled loop unrolling offer any particular advantage? What
>>> do the numbers look like if we don't do this or if we process,
On Thu, Feb 06, 2025 at 08:44:35AM +, chiranmoy.bhattacha...@fujitsu.com
wrote:
>> Does this hand-rolled loop unrolling offer any particular advantage? What
>> do the numbers look like if we don't do this or if we process, say, 4
>> vectors at a time?
>
> The unrolled version performs better
> Hm. These results are so similar that I'm tempted to suggest we just
> remove the section of code dedicated to alignment. Is there any reason not
> to do that?
It seems that the double load overhead from unaligned memory access isn’t
too taxing, even on larger inputs. We can remove it to simpl
On Tue, Feb 04, 2025 at 09:01:33AM +, chiranmoy.bhattacha...@fujitsu.com
wrote:
>> +/*
>> + * For smaller inputs, aligning the buffer degrades the performance.
>> + * Therefore, the buffers only when the input size is sufficiently
>> large.
>> + */
>
>> Is the inverse true, i
45.897 | 41.890
256 | 62.440 | 63.454 | 58.666
512 | 100.120 | 102.767 | 99.861
1024 | 159.574 | 158.594 |164.975
2048 | 282.354 | 281.198 |283.937
4096 | 532.038 | 531.068 |53
The meson configure check seems to fail on my machine:
error: too many arguments to function call, expected 0, have 1
10 | svuint64_t popcnt = svcntb(val);
| ~~ ^~~
error: returning '__SVInt64_t' from a function with incompa
On Wed, Jan 22, 2025 at 11:04:22AM +, chiranmoy.bhattacha...@fujitsu.com
wrote:
> If there is no further feedback from the community, may we submit the
> patch for the next commit fest?
I would encourage you to create a commitfest entry so that it is picked up
by our automated patch testing t
> This looks good. Thanks Chiranmoy and team. Can you address any other
> feedback from Nathan or others here? Then we can pursue further reviews and
> merging of the patch.
Thank you for the review.
If there is no further feedback from the community, may we submit the patch for
the next commit
Here is the updated patch using
pg_attribute_target("arch=armv8-a+sve") to compile the arch-specific
function instead of using compiler flags.
---
This looks good. Thanks Chiranmoy and team. Can you address any other
feedback from Nathan or others here? Then we can pursue further reviews
an
Hi all,
Here is the updated patch using pg_attribute_target("arch=armv8-a+sve") to
compile the arch-specific function instead of using compiler flags.
---
Chiranmoy
v3-0001-SVE-support-for-popcount-and-popcount-masked.patch
Description: v3-0001-SVE-support-for-popcount-an
ubject: Re: [PATCH] SVE popcount support
On 12/9/24 12:21 AM,
devanga.susmi...@fujitsu.com<mailto:devanga.susmi...@fujitsu.com> wrote:
Hello,
We are sharing our patch for pg_popcount with SVE support as a contribution
from our side in this thread. We hope this contribution will help i
On 12/9/24 12:21 AM, devanga.susmi...@fujitsu.com wrote:
Hello,
We are sharing our patch for pg_popcount with SVE support as a
contribution from our side in this thread. We hope this contribution
will help in exploring and refining the popcount implementation further.
Our patch uses the
: Susmitha, Devanga ; pgsql-hackers
; Hajela, Ragesh ;
Bhattacharya, Chiranmoy ; M A, Rajat
Subject: Re: Popcount optimization using SVE for ARM
I suggest we move this discussion to the existing thread on this subject:
https://www.postgresql.org/message-id/flat/010101936e4aaa70-b474ab9e
I suggest we move this discussion to the existing thread on this subject:
https://www.postgresql.org/message-id/flat/010101936e4aaa70-b474ab9e-b9ce-474d-a3ba-a3dc223d295c-00%40us-west-2.amazonses.com
--
nathan
I did not yet look into this in detail, but please note that PostgreSQL
comments style is /**/ not //. Also, please, do not top post on this list
method, to determine the correct popcount
implementation based on the architecture, thereby requiring fewer code changes.
The patch also includes implementations for popcount32, popcount64 and popcount
masked. We'd be happy to discuss any potential overlaps and collaborate further
to
On Fri, 6 Dec 2024 at 10:54, devanga.susmi...@fujitsu.com <
devanga.susmi...@fujitsu.com> wrote:
> Hello, This email is to discuss the contribution of the speed-up
> popcount and popcount mask feature we have developed for the ARM
> architecture using SVE intrinsics.
> The
system
(https://commitfest.postgresql.org/) so that it is picked up by our
automated patch testing tools?
> +# Check for ARMv8 SVE popcount intrinsics
> +#
> +CFLAGS_POPCNT=""
> +PG_POPCNT_OBJS=""
> +PGAC_SVE_POPCNT_INTRINSICS([])
> +if test x&
On Wed, Nov 27, 2024 at 03:43:27PM +, Malladi, Rama wrote:
> • Attachments protected by Amazon:
> • 0001-SVE-popcount-support.patch |
> • SVE-popcount-support-PostgreSQL.png |
>
> Amazon has replaced the attachments in this email with download links.
> Downloads will
On Thu, 28 Nov 2024 at 20:22, Malladi, Rama wrote:
>
> Attachments protected by Amazon: 0001-SVE-popcount-support.patch |
SVE-popcount-support-PostgreSQL.png |
> Amazon has replaced the attachments in this email with download links.
Downloads will be available until December 27, 2024, 1
On Thu, 28 Nov 2024 at 20:22, Malladi, Rama wrote:
>
> Attachments protected by Amazon: 0001-SVE-popcount-support.patch |
> SVE-popcount-support-PostgreSQL.png |
> Amazon has replaced the attachments in this email with download links.
> Downloads will be available until December
Attachments protected by Amazon:
[0001-SVE-popcount-support.patch]
https://us-west-2.secure-attach.amazon.com/a29c9ff9-1f9b-430f-9b3c-07fde9a419aa/f9178627-0600-4527-bc5c-7e4cb9ef6e9a
[SVE-popcount-support-PostgreSQL.png]
https://us-west-2.secure-attach.amazon.com/a29c9ff9-1f9b-430f-9b3c
On Thu, Nov 07, 2024 at 08:38:21PM +, Devulapalli, Raghuveer wrote:
>
>> Of course, as soon as I committed this, I noticed that it's broken. It
>> seems that
>> compilers are rather picky about how multiple target options are specified.
>
> Just curious, which compiler complained?
Clang.
> Of course, as soon as I committed this, I noticed that it's broken. It seems
> that
> compilers are rather picky about how multiple target options are specified.
Just curious, which compiler complained?
Raghuveer
On Thu, Nov 07, 2024 at 02:03:04PM -0600, Nathan Bossart wrote:
> Committed.
Of course, as soon as I committed this, I noticed that it's broken. It
seems that compilers are rather picky about how multiple target options are
specified. AFAICT the commonly supported syntax is to put the entire lis
Committed.
--
nathan
On Thu, Nov 07, 2024 at 11:12:37AM -0500, Andres Freund wrote:
> One thing that'd I'd like to see this being used is to elide the indirection
> when the current target platform *already* supports the necessary
> intrinsics. Adding a bunch of indirection for short & common operations is
> decidedly
Hi,
On 2024-11-06 20:26:47 -0600, Nathan Bossart wrote:
> From d0fb7e0e375f7b76d4df90910c21e9448dd3b380 Mon Sep 17 00:00:00 2001
> From: Nathan Bossart
> Date: Wed, 16 Oct 2024 15:57:55 -0500
> Subject: [PATCH v3 1/1] use __attribute__((target(...))) for AVX-512 stuff
One thing that'd I'd like t
conftest$ac_exeext conftest.$ac_ext
-CFLAGS="$pgac_save_CFLAGS"
fi
-{ $as_echo "$as_me:${as_lineno-$LINENO}: result:
$pgac_cv_xsave_intrinsics__mxsave" >&5
-$as_echo "$pgac_cv_xsave_intrinsics__mxsave" >&6; }
-if test x"$pgac_cv_xsave_intri
On Thu, Oct 31, 2024 at 07:58:06PM +, Devulapalli, Raghuveer wrote:
> LGTM.
Thanks. Barring additional feedback, I plan to commit this soon.
--
nathan
$LINENO}: result:
$pgac_cv_xsave_intrinsics__mxsave" >&5
-$as_echo "$pgac_cv_xsave_intrinsics__mxsave" >&6; }
-if test x"$pgac_cv_xsave_intrinsics__mxsave" = x"yes"; then
- CFLAGS_XSAVE="-mxsave"
+{ $as_echo "$as_
> Here is an updated patch with this change.
LGTM.
Raghuveer
> Oh, good catch. IIUC we only need to check for #ifndef _MSC_VER in the
> configure programs for meson. pg_attribute_target will be empty on MSVC, and
> I
> believe we only support meson builds there.
Right. __has_attribute (target) produces a compiler warning on MSVC:
https://gcc.godbolt.o
On Wed, Oct 30, 2024 at 08:53:10PM +, Raghuveer Devulapalli wrote:
> BTW, I just realized function attributes for xsave and avx512 don't work
> on MSVC (see
> https://developercommunity.visualstudio.com/t/support-function-target-attribute-and-mutiversioning/10130630).
> Not sure if you care abo
BTW, I just realized function attributes for xsave and avx512 don't work on
MSVC (see
https://developercommunity.visualstudio.com/t/support-function-target-attribute-and-mutiversioning/10130630).
Not sure if you care about it. Its an easy fix (see
https://gcc.godbolt.org/z/Pebdj3vMx).
The following review has been posted through the commitfest application:
make installcheck-world: tested, failed
Implements feature: tested, failed
Spec compliant: tested, failed
Documentation:tested, failed
Changes LGTM. Makes the Makefile look clean. Built and ran te
y_link "$LINENO"; then :
- pgac_cv_xsave_intrinsics__mxsave=yes
+ pgac_cv_xsave_intrinsics=yes
else
- pgac_cv_xsave_intrinsics__mxsave=no
+ pgac_cv_xsave_intrinsics=no
fi
rm -f core conftest.err conftest.$ac_objext \
conftest$ac_exeext conftest.$ac_ext
-CFLAGS="$pgac_save_
On Wed, Jul 31, 2024 at 04:43:02PM -0500, Nathan Bossart wrote:
> On Wed, Jul 31, 2024 at 01:52:54PM -0700, Andres Freund wrote:
>> I think we'd be better off enabling architectural features on a per-function
>> basis, roughly like this:
>>
>> [...]
>>
>> /* FIXME: Should be gated by configure che
On Wed, Jul 31, 2024 at 01:52:54PM -0700, Andres Freund wrote:
> On 2024-07-30 22:12:18 -0500, Nathan Bossart wrote:
>> As I started on this, I remembered why I needed it. The file
>> pg_popcount_avx512_choose.c is compiled without the AVX-512 flags in order
>> to avoid inadvertently issuing any A
Hi,
On 2024-07-30 22:12:18 -0500, Nathan Bossart wrote:
> On Tue, Jul 30, 2024 at 10:01:50PM -0500, Nathan Bossart wrote:
> > On Tue, Jul 30, 2024 at 07:43:08PM -0700, Andres Freund wrote:
> >> My point is that _xgetbv() is made available by -mavx512vpopcntdq
> >> -mavx512bw
> >> alone, without n
On Tue, Jul 30, 2024 at 10:01:50PM -0500, Nathan Bossart wrote:
> On Tue, Jul 30, 2024 at 07:43:08PM -0700, Andres Freund wrote:
>> My point is that _xgetbv() is made available by -mavx512vpopcntdq -mavx512bw
>> alone, without needing -mxsave:
>
> Oh, I see. I'll work on a patch to remove that co
On Tue, Jul 30, 2024 at 07:43:08PM -0700, Andres Freund wrote:
> On 2024-07-30 21:01:31 -0500, Nathan Bossart wrote:
>> The main purpose of the XSAVE compiler check is to determine whether we
>> need to add -mxsave in order to use _xgetbv() [0]. If that wasn't a
>> factor, we could probably skip i
Hi,
On 2024-07-30 21:01:31 -0500, Nathan Bossart wrote:
> On Tue, Jul 30, 2024 at 06:46:51PM -0700, Andres Freund wrote:
> > On 2024-07-30 20:20:34 -0500, Nathan Bossart wrote:
> >> On Tue, Jul 30, 2024 at 05:49:59PM -0700, Andres Freund wrote:
> >> > Why are we actually checking for xsave? We're
On Tue, Jul 30, 2024 at 06:46:51PM -0700, Andres Freund wrote:
> On 2024-07-30 20:20:34 -0500, Nathan Bossart wrote:
>> On Tue, Jul 30, 2024 at 05:49:59PM -0700, Andres Freund wrote:
>> > Why are we actually checking for xsave? We're not using xsave itself and I
>> > couldn't find a comment in 7927
Hi,
On 2024-07-30 20:20:34 -0500, Nathan Bossart wrote:
> On Tue, Jul 30, 2024 at 05:49:59PM -0700, Andres Freund wrote:
> > Ah, I somehow thought we'd avoid the runtime check in case we determine at
> > compile time we don't need any extra flags to enable the AVX512 stuff
> > (similar
> > to how
On Tue, Jul 30, 2024 at 05:49:59PM -0700, Andres Freund wrote:
> Ah, I somehow thought we'd avoid the runtime check in case we determine at
> compile time we don't need any extra flags to enable the AVX512 stuff (similar
> to how we deal with crc32). But it looks like that's not the case - which
>
On Wed, Jul 31, 2024 at 12:50 PM Andres Freund wrote:
> It's one thing for the avx512 path to have that overhead, but it's
> particularly absurd for pg_popcount32/pg_popcount64, where
>
> a) The function call overhead is a larger proportion of the cost.
> b) the instruction is almost universally a
Hi,
On 2024-07-30 16:32:07 -0500, Nathan Bossart wrote:
> On Tue, Jul 30, 2024 at 02:07:01PM -0700, Andres Freund wrote:
> > Now, a reasonable counter-argument would be that only some of these macros
> > are
> > defined for msvc ([1]). However, as it turns out, the test is broken
> > today, as m
On Tue, Jul 30, 2024 at 04:32:07PM -0500, Nathan Bossart wrote:
> On Tue, Jul 30, 2024 at 02:07:01PM -0700, Andres Freund wrote:
>> Afaict we could just check for predefined preprocessor macros:
>>
>> echo|time gcc -c -mxsave -mavx512vpopcntdq -mavx512bw -xc -dM -E - -o
>> -|grep -E '__XSAVE__|_
ES
> [7.481] Checking if "XSAVE intrinsics without -mxsave" : links: NO
> [8.097] Checking if "XSAVE intrinsics with -mxsave" : links: YES
> [8.641] Checking if "AVX-512 popcount without -mavx512vpopcntdq -mavx512bw" :
> links: NO
> [9.183] Checking
hecking if " __atomic_compare_exchange_n(int32)" : links: YES
[6.940] Checking if " __atomic_compare_exchange_n(int64)" : links: YES
[7.481] Checking if "XSAVE intrinsics without -mxsave" : links: NO
[8.097] Checking if "XSAVE intrinsics with -mxsave" : links: YE
On Thu, Apr 18, 2024 at 05:13:58PM -0500, Nathan Bossart wrote:
> Makes sense, thanks. I'm planning to commit this fix sometime early next
> week.
Committed.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
On Thu, Apr 18, 2024 at 10:11:08PM +, Devulapalli, Raghuveer wrote:
>> On that note, is it necessary to also check for avx512f? At the moment,
>> we are assuming that's supported if the other AVX-512 instructions are
>> available.
>
> No, it's not needed. There are no CPU's with avx512bw/avx5
> On that note, is it necessary to also check for avx512f? At the moment, we
> are assuming that's supported if the other AVX-512 instructions are available.
No, it's not needed. There are no CPU's with avx512bw/avx512popcnt without
avx512f. Unfortunately though, avx512popcnt does not mean avx
here's support for XSAVE instructions?
*/
-bool
-pg_popcount_avx512_available(void)
+static inline bool
+xsave_available(void)
{
unsigned int exx[4] = {0, 0, 0, 0};
- /* Does CPUID say there's support for AVX-512 popcount instructions? */
-#if defined(HAVE__GET_CPUID_COUNT)
- __get_cpui
> Thanks for the feedback. I've attached an updated patch.
(1) Shouldn't it be: return (_xgetbv(0) & 0xe6) == 0xe6; ? Otherwise
zmm_regs_available() will return false.
(2) Nitpick: avx512_popcnt_available and avx512_bw_available() run the same
cpuid leaf. You could combine them into one to avo
equired for the AVX-512
- * pg_popcount() implementation.
+ * Does CPUID say there's support for XSAVE instructions?
*/
-bool
-pg_popcount_avx512_available(void)
+static inline bool
+xsave_available(void)
{
unsigned int exx[4] = {0, 0, 0, 0};
- /* Does CPUID say there's support fo
> (Step 3 can be done in any order relative to 1 and 2.)"
Thanks for confirming. IIUC my patch should be sufficient, then.
> It also seems that step 1 and step 2 need to be done prior to the CPUID
> OSXSAVE check in the popcount code.
This seems to contradict the note about doin
ons supported).
(Step 3 can be done in any order relative to 1 and 2.)"
It also seems that step 1 and step 2 need to be done prior to the CPUID OSXSAVE
check in the popcount code.
[0]: https://cdrdv2.intel.com/v1/dl/getContent/671200
- Akash Shankaran
It was brought to my attention [0] that we probably should be checking for
the OSXSAVE bit instead of the XSAVE bit when determining whether there's
support for the XGETBV instruction. IIUC that should indicate that both
the OS and the processor have XGETBV support (not just the processor).
I've a
Nathan Bossart writes:
> On Sun, Apr 07, 2024 at 08:23:32PM -0500, Nathan Bossart wrote:
>> The Intel documentation for _mm256_undefined_si256() [0]
>> indicates that it is intended to return "undefined elements," so it seems
>> like the use of an uninitialized variable might be intentional.
> Se
On Sun, Apr 07, 2024 at 08:23:32PM -0500, Nathan Bossart wrote:
> The Intel documentation for _mm256_undefined_si256() [0]
> indicates that it is intended to return "undefined elements," so it seems
> like the use of an uninitialized variable might be intentional.
See also https://gcc.gnu.org/git/
On Sun, Apr 07, 2024 at 08:42:12PM -0400, Tom Lane wrote:
> Today's Coverity run produced this warning, which seemingly was
> triggered by one of these commits, but I can't make much sense
> of it:
>
> *** CID 1596255: Uninitialized variables (UNINIT)
> /usr/lib/gcc/x86_64-linux-gnu/10/include/a
Nathan Bossart writes:
> Here is what I have staged for commit, which I intend to do shortly.
Today's Coverity run produced this warning, which seemingly was
triggered by one of these commits, but I can't make much sense
of it:
*** CID 1596255: Uninitialized variables (UNINIT)
/usr/lib/gcc/x86
On Sat, Apr 06, 2024 at 02:41:01PM -0500, Nathan Bossart wrote:
> Here is what I have staged for commit, which I intend to do shortly.
Committed.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
{pgac_cv_xsave_intrinsics__mxsave+:} false; then :
+ $as_echo_n "(cached) " >&6
+else
+ pgac_save_CFLAGS=$CFLAGS
+CFLAGS="$pgac_save_CFLAGS -mxsave"
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h. */
+#include
+int
+main ()
+{
+return _xgetbv(0) &am
On Sat, 6 Apr 2024 at 14:17, Nathan Bossart wrote:
>
> On Sat, Apr 06, 2024 at 12:08:14PM +1300, David Rowley wrote:
> > Won't Valgrind complain about this?
> >
> > +pg_popcount_avx512(const char *buf, int bytes)
> >
> > + buf = (const char *) TYPEALIGN_DOWN(sizeof(__m512i), buf);
> >
> > + val =
On Sat, Apr 06, 2024 at 12:08:14PM +1300, David Rowley wrote:
> Won't Valgrind complain about this?
>
> +pg_popcount_avx512(const char *buf, int bytes)
>
> + buf = (const char *) TYPEALIGN_DOWN(sizeof(__m512i), buf);
>
> + val = _mm512_maskz_loadu_epi8(mask, (const __m512i *) buf);
I haven't be
On Sat, 6 Apr 2024 at 04:38, Nathan Bossart wrote:
> This seems to provide a small performance boost, so I've incorporated it
> into v27.
Won't Valgrind complain about this?
+pg_popcount_avx512(const char *buf, int bytes)
+ buf = (const char *) TYPEALIGN_DOWN(sizeof(__m512i), buf);
+ val = _mm
Bossart
Amazon Web Services: https://aws.amazon.com
>From 9fc4b7556b72d51fce676db84b446099767efff3 Mon Sep 17 00:00:00 2001
From: Nathan Bossart
Date: Wed, 27 Mar 2024 16:39:24 -0500
Subject: [PATCH v27 1/2] AVX512 popcount support
---
config/c-compiler.m4 | 58 ++
configu
On Fri, Apr 05, 2024 at 10:33:27AM +0300, Ants Aasma wrote:
> The main issue I saw was that clang was able to peel off the first
> iteration of the loop and then eliminate the mask assignment and
> replace masked load with a memory operand for vpopcnt. I was not able
> to convince gcc to do that re
On Fri, 5 Apr 2024 at 07:15, Nathan Bossart wrote:
> Here is an updated patch set. IMHO this is in decent shape and is
> approaching committable.
I checked the code generation on various gcc and clang versions. It
looks mostly fine starting from versions where avx512 is supported,
gcc-7.1 and cl
ect: [PATCH v26 1/2] AVX512 popcount support
---
config/c-compiler.m4 | 58 ++
configure| 252 +++
configure.ac | 51 ++
meson.build | 87 +
src/Makefile.global
On Thu, Apr 04, 2024 at 04:02:53PM +0300, Ants Aasma wrote:
> Speaking of which, what does bumping up the inlined version threshold
> to 16 do with and without AVX-512 available? Linearly extrapolating
> the 2 and 4 byte numbers it might just come ahead in both cases,
> making the choice easy.
IIR
On Thu, Apr 04, 2024 at 04:28:58PM +1300, David Rowley wrote:
> On Thu, 4 Apr 2024 at 11:50, Nathan Bossart wrote:
>> If we can verify this approach won't cause segfaults and can stomach the
>> regression between 8 and 16 bytes, I'd happily pivot to this approach so
>> that we can avoid the functi
On Thu, 4 Apr 2024 at 01:50, Nathan Bossart wrote:
> If we can verify this approach won't cause segfaults and can stomach the
> regression between 8 and 16 bytes, I'd happily pivot to this approach so
> that we can avoid the function call dance that I have in v25.
The approach I posted does not r
On Thu, 4 Apr 2024 at 11:50, Nathan Bossart wrote:
> If we can verify this approach won't cause segfaults and can stomach the
> regression between 8 and 16 bytes, I'd happily pivot to this approach so
> that we can avoid the function call dance that I have in v25.
>
> Thoughts?
If we're worried a
On Tue, Apr 02, 2024 at 11:30:39PM +0300, Ants Aasma wrote:
> On Tue, 2 Apr 2024 at 00:31, Nathan Bossart wrote:
>> On Tue, Apr 02, 2024 at 12:11:59AM +0300, Ants Aasma wrote:
>> > What about using the masking capabilities of AVX-512 to handle the
>> > tail in the same code path? Masked out portio
1 - 100 of 235 matches
Mail list logo