On Thu, Nov 07, 2024 at 08:38:21PM +, Devulapalli, Raghuveer wrote:
>
>> Of course, as soon as I committed this, I noticed that it's broken. It
>> seems that
>> compilers are rather picky about how multiple target options are specified.
>
> Just curious, which compiler complained?
Clang.
> Of course, as soon as I committed this, I noticed that it's broken. It seems
> that
> compilers are rather picky about how multiple target options are specified.
Just curious, which compiler complained?
Raghuveer
On Thu, Nov 07, 2024 at 02:03:04PM -0600, Nathan Bossart wrote:
> Committed.
Of course, as soon as I committed this, I noticed that it's broken. It
seems that compilers are rather picky about how multiple target options are
specified. AFAICT the commonly supported syntax is to put the entire lis
Committed.
--
nathan
On Thu, Nov 07, 2024 at 11:12:37AM -0500, Andres Freund wrote:
> One thing that'd I'd like to see this being used is to elide the indirection
> when the current target platform *already* supports the necessary
> intrinsics. Adding a bunch of indirection for short & common operations is
> decidedly
Hi,
On 2024-11-06 20:26:47 -0600, Nathan Bossart wrote:
> From d0fb7e0e375f7b76d4df90910c21e9448dd3b380 Mon Sep 17 00:00:00 2001
> From: Nathan Bossart
> Date: Wed, 16 Oct 2024 15:57:55 -0500
> Subject: [PATCH v3 1/1] use __attribute__((target(...))) for AVX-512 stuff
One thing that'd I'd like t
rebased
--
nathan
>From d0fb7e0e375f7b76d4df90910c21e9448dd3b380 Mon Sep 17 00:00:00 2001
From: Nathan Bossart
Date: Wed, 16 Oct 2024 15:57:55 -0500
Subject: [PATCH v3 1/1] use __attribute__((target(...))) for AVX-512 stuff
---
config/c-compiler.m4 | 64 +-
configure
On Thu, Oct 31, 2024 at 07:58:06PM +, Devulapalli, Raghuveer wrote:
> LGTM.
Thanks. Barring additional feedback, I plan to commit this soon.
--
nathan
On Wed, Oct 30, 2024 at 04:10:10PM -0500, Nathan Bossart wrote:
> On Wed, Oct 30, 2024 at 08:53:10PM +, Raghuveer Devulapalli wrote:
>> BTW, I just realized function attributes for xsave and avx512 don't work
>> on MSVC (see
>> https://developercommunity.visualstudio.com/t/support-function-targ
> Here is an updated patch with this change.
LGTM.
Raghuveer
> Oh, good catch. IIUC we only need to check for #ifndef _MSC_VER in the
> configure programs for meson. pg_attribute_target will be empty on MSVC, and
> I
> believe we only support meson builds there.
Right. __has_attribute (target) produces a compiler warning on MSVC:
https://gcc.godbolt.o
On Wed, Oct 30, 2024 at 08:53:10PM +, Raghuveer Devulapalli wrote:
> BTW, I just realized function attributes for xsave and avx512 don't work
> on MSVC (see
> https://developercommunity.visualstudio.com/t/support-function-target-attribute-and-mutiversioning/10130630).
> Not sure if you care abo
BTW, I just realized function attributes for xsave and avx512 don't work on
MSVC (see
https://developercommunity.visualstudio.com/t/support-function-target-attribute-and-mutiversioning/10130630).
Not sure if you care about it. Its an easy fix (see
https://gcc.godbolt.org/z/Pebdj3vMx).
The following review has been posted through the commitfest application:
make installcheck-world: tested, failed
Implements feature: tested, failed
Spec compliant: tested, failed
Documentation:tested, failed
Changes LGTM. Makes the Makefile look clean. Built and ran te
On Tue, Oct 08, 2024 at 09:36:03PM -0500, Nathan Bossart wrote:
> On Wed, Jul 31, 2024 at 04:43:02PM -0500, Nathan Bossart wrote:
>> On Wed, Jul 31, 2024 at 01:52:54PM -0700, Andres Freund wrote:
>>> I think we'd be better off enabling architectural features on a per-function
>>> basis, roughly lik
On Wed, Jul 31, 2024 at 04:43:02PM -0500, Nathan Bossart wrote:
> On Wed, Jul 31, 2024 at 01:52:54PM -0700, Andres Freund wrote:
>> I think we'd be better off enabling architectural features on a per-function
>> basis, roughly like this:
>>
>> [...]
>>
>> /* FIXME: Should be gated by configure che
On Wed, Jul 31, 2024 at 01:52:54PM -0700, Andres Freund wrote:
> On 2024-07-30 22:12:18 -0500, Nathan Bossart wrote:
>> As I started on this, I remembered why I needed it. The file
>> pg_popcount_avx512_choose.c is compiled without the AVX-512 flags in order
>> to avoid inadvertently issuing any A
Hi,
On 2024-07-30 22:12:18 -0500, Nathan Bossart wrote:
> On Tue, Jul 30, 2024 at 10:01:50PM -0500, Nathan Bossart wrote:
> > On Tue, Jul 30, 2024 at 07:43:08PM -0700, Andres Freund wrote:
> >> My point is that _xgetbv() is made available by -mavx512vpopcntdq
> >> -mavx512bw
> >> alone, without n
On Tue, Jul 30, 2024 at 10:01:50PM -0500, Nathan Bossart wrote:
> On Tue, Jul 30, 2024 at 07:43:08PM -0700, Andres Freund wrote:
>> My point is that _xgetbv() is made available by -mavx512vpopcntdq -mavx512bw
>> alone, without needing -mxsave:
>
> Oh, I see. I'll work on a patch to remove that co
On Tue, Jul 30, 2024 at 07:43:08PM -0700, Andres Freund wrote:
> On 2024-07-30 21:01:31 -0500, Nathan Bossart wrote:
>> The main purpose of the XSAVE compiler check is to determine whether we
>> need to add -mxsave in order to use _xgetbv() [0]. If that wasn't a
>> factor, we could probably skip i
Hi,
On 2024-07-30 21:01:31 -0500, Nathan Bossart wrote:
> On Tue, Jul 30, 2024 at 06:46:51PM -0700, Andres Freund wrote:
> > On 2024-07-30 20:20:34 -0500, Nathan Bossart wrote:
> >> On Tue, Jul 30, 2024 at 05:49:59PM -0700, Andres Freund wrote:
> >> > Why are we actually checking for xsave? We're
On Tue, Jul 30, 2024 at 06:46:51PM -0700, Andres Freund wrote:
> On 2024-07-30 20:20:34 -0500, Nathan Bossart wrote:
>> On Tue, Jul 30, 2024 at 05:49:59PM -0700, Andres Freund wrote:
>> > Why are we actually checking for xsave? We're not using xsave itself and I
>> > couldn't find a comment in 7927
Hi,
On 2024-07-30 20:20:34 -0500, Nathan Bossart wrote:
> On Tue, Jul 30, 2024 at 05:49:59PM -0700, Andres Freund wrote:
> > Ah, I somehow thought we'd avoid the runtime check in case we determine at
> > compile time we don't need any extra flags to enable the AVX512 stuff
> > (similar
> > to how
On Tue, Jul 30, 2024 at 05:49:59PM -0700, Andres Freund wrote:
> Ah, I somehow thought we'd avoid the runtime check in case we determine at
> compile time we don't need any extra flags to enable the AVX512 stuff (similar
> to how we deal with crc32). But it looks like that's not the case - which
>
On Wed, Jul 31, 2024 at 12:50 PM Andres Freund wrote:
> It's one thing for the avx512 path to have that overhead, but it's
> particularly absurd for pg_popcount32/pg_popcount64, where
>
> a) The function call overhead is a larger proportion of the cost.
> b) the instruction is almost universally a
Hi,
On 2024-07-30 16:32:07 -0500, Nathan Bossart wrote:
> On Tue, Jul 30, 2024 at 02:07:01PM -0700, Andres Freund wrote:
> > Now, a reasonable counter-argument would be that only some of these macros
> > are
> > defined for msvc ([1]). However, as it turns out, the test is broken
> > today, as m
On Tue, Jul 30, 2024 at 04:32:07PM -0500, Nathan Bossart wrote:
> On Tue, Jul 30, 2024 at 02:07:01PM -0700, Andres Freund wrote:
>> Afaict we could just check for predefined preprocessor macros:
>>
>> echo|time gcc -c -mxsave -mavx512vpopcntdq -mavx512bw -xc -dM -E - -o
>> -|grep -E '__XSAVE__|_
On Tue, Jul 30, 2024 at 02:07:01PM -0700, Andres Freund wrote:
> I've noticed that the configure probes for this are quite slow - pretty much
> the slowest step in a meson setup (and autoconf is similar). While looking
> into this, I also noticed that afaict the tests don't do the right thing for
Hi,
On 2024-04-23 11:02:07 -0500, Nathan Bossart wrote:
> On Thu, Apr 18, 2024 at 05:13:58PM -0500, Nathan Bossart wrote:
> > Makes sense, thanks. I'm planning to commit this fix sometime early next
> > week.
>
> Committed.
I've noticed that the configure probes for this are quite slow - pretty
On Thu, Apr 18, 2024 at 05:13:58PM -0500, Nathan Bossart wrote:
> Makes sense, thanks. I'm planning to commit this fix sometime early next
> week.
Committed.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
On Thu, Apr 18, 2024 at 10:11:08PM +, Devulapalli, Raghuveer wrote:
>> On that note, is it necessary to also check for avx512f? At the moment,
>> we are assuming that's supported if the other AVX-512 instructions are
>> available.
>
> No, it's not needed. There are no CPU's with avx512bw/avx5
> On that note, is it necessary to also check for avx512f? At the moment, we
> are assuming that's supported if the other AVX-512 instructions are available.
No, it's not needed. There are no CPU's with avx512bw/avx512popcnt without
avx512f. Unfortunately though, avx512popcnt does not mean avx
On Thu, Apr 18, 2024 at 09:29:55PM +, Devulapalli, Raghuveer wrote:
> (1) Shouldn't it be: return (_xgetbv(0) & 0xe6) == 0xe6; ? Otherwise
> zmm_regs_available() will return false..
Yes, that's a mistake. I fixed that in v3.
> (2) Nitpick: avx512_popcnt_available and avx512_bw_available() ru
> Thanks for the feedback. I've attached an updated patch.
(1) Shouldn't it be: return (_xgetbv(0) & 0xe6) == 0xe6; ? Otherwise
zmm_regs_available() will return false.
(2) Nitpick: avx512_popcnt_available and avx512_bw_available() run the same
cpuid leaf. You could combine them into one to avo
On Thu, Apr 18, 2024 at 08:24:03PM +, Devulapalli, Raghuveer wrote:
>> This seems to contradict the note about doing step 3 at any point, and
>> given step 1 is the OSXSAVE check, I'm not following what this means,
>> anyway.
>
> It is recommended that you run the xgetbv code before you check
On Thu, Apr 18, 2024 at 06:12:22PM +, Shankaran, Akash wrote:
> Good find. I confirmed after speaking with an intel expert, and from the
> intel AVX-512 manual [0] section 14.3, which recommends to check bit27. From
> the manual:
>
> "Prior to using Intel AVX, the application must identify t
> It was brought to my attention [0] that we probably should be checking for
> the OSXSAVE bit instead of the XSAVE bit when determining whether there's
> support for the XGETBV instruction. IIUC that should indicate that both the
> OS and the processor have XGETBV support (not just the process
It was brought to my attention [0] that we probably should be checking for
the OSXSAVE bit instead of the XSAVE bit when determining whether there's
support for the XGETBV instruction. IIUC that should indicate that both
the OS and the processor have XGETBV support (not just the processor).
I've a
Nathan Bossart writes:
> On Sun, Apr 07, 2024 at 08:23:32PM -0500, Nathan Bossart wrote:
>> The Intel documentation for _mm256_undefined_si256() [0]
>> indicates that it is intended to return "undefined elements," so it seems
>> like the use of an uninitialized variable might be intentional.
> Se
On Sun, Apr 07, 2024 at 08:23:32PM -0500, Nathan Bossart wrote:
> The Intel documentation for _mm256_undefined_si256() [0]
> indicates that it is intended to return "undefined elements," so it seems
> like the use of an uninitialized variable might be intentional.
See also https://gcc.gnu.org/git/
On Sun, Apr 07, 2024 at 08:42:12PM -0400, Tom Lane wrote:
> Today's Coverity run produced this warning, which seemingly was
> triggered by one of these commits, but I can't make much sense
> of it:
>
> *** CID 1596255: Uninitialized variables (UNINIT)
> /usr/lib/gcc/x86_64-linux-gnu/10/include/a
Nathan Bossart writes:
> Here is what I have staged for commit, which I intend to do shortly.
Today's Coverity run produced this warning, which seemingly was
triggered by one of these commits, but I can't make much sense
of it:
*** CID 1596255: Uninitialized variables (UNINIT)
/usr/lib/gcc/x86
On Sat, Apr 06, 2024 at 02:41:01PM -0500, Nathan Bossart wrote:
> Here is what I have staged for commit, which I intend to do shortly.
Committed.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
On Sat, Apr 06, 2024 at 02:51:39PM +1300, David Rowley wrote:
> On Sat, 6 Apr 2024 at 14:17, Nathan Bossart wrote:
>> On Sat, Apr 06, 2024 at 12:08:14PM +1300, David Rowley wrote:
>> > Won't Valgrind complain about this?
>> >
>> > +pg_popcount_avx512(const char *buf, int bytes)
>> >
>> > + buf = (
On Sat, 6 Apr 2024 at 14:17, Nathan Bossart wrote:
>
> On Sat, Apr 06, 2024 at 12:08:14PM +1300, David Rowley wrote:
> > Won't Valgrind complain about this?
> >
> > +pg_popcount_avx512(const char *buf, int bytes)
> >
> > + buf = (const char *) TYPEALIGN_DOWN(sizeof(__m512i), buf);
> >
> > + val =
On Sat, Apr 06, 2024 at 12:08:14PM +1300, David Rowley wrote:
> Won't Valgrind complain about this?
>
> +pg_popcount_avx512(const char *buf, int bytes)
>
> + buf = (const char *) TYPEALIGN_DOWN(sizeof(__m512i), buf);
>
> + val = _mm512_maskz_loadu_epi8(mask, (const __m512i *) buf);
I haven't be
On Sat, 6 Apr 2024 at 04:38, Nathan Bossart wrote:
> This seems to provide a small performance boost, so I've incorporated it
> into v27.
Won't Valgrind complain about this?
+pg_popcount_avx512(const char *buf, int bytes)
+ buf = (const char *) TYPEALIGN_DOWN(sizeof(__m512i), buf);
+ val = _mm
On Fri, Apr 05, 2024 at 07:58:44AM -0500, Nathan Bossart wrote:
> On Fri, Apr 05, 2024 at 10:33:27AM +0300, Ants Aasma wrote:
>> The main issue I saw was that clang was able to peel off the first
>> iteration of the loop and then eliminate the mask assignment and
>> replace masked load with a memor
On Fri, Apr 05, 2024 at 10:33:27AM +0300, Ants Aasma wrote:
> The main issue I saw was that clang was able to peel off the first
> iteration of the loop and then eliminate the mask assignment and
> replace masked load with a memory operand for vpopcnt. I was not able
> to convince gcc to do that re
On Fri, 5 Apr 2024 at 07:15, Nathan Bossart wrote:
> Here is an updated patch set. IMHO this is in decent shape and is
> approaching committable.
I checked the code generation on various gcc and clang versions. It
looks mostly fine starting from versions where avx512 is supported,
gcc-7.1 and cl
Here is an updated patch set. IMHO this is in decent shape and is
approaching committable.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From df59d3e78604e4530f5096bafc08ac94e13d82d2 Mon Sep 17 00:00:00 2001
From: Nathan Bossart
Date: Wed, 27 Mar 2024 16:39:24 -0500
Subject: [P
On Thu, Apr 04, 2024 at 04:02:53PM +0300, Ants Aasma wrote:
> Speaking of which, what does bumping up the inlined version threshold
> to 16 do with and without AVX-512 available? Linearly extrapolating
> the 2 and 4 byte numbers it might just come ahead in both cases,
> making the choice easy.
IIR
On Thu, Apr 04, 2024 at 04:28:58PM +1300, David Rowley wrote:
> On Thu, 4 Apr 2024 at 11:50, Nathan Bossart wrote:
>> If we can verify this approach won't cause segfaults and can stomach the
>> regression between 8 and 16 bytes, I'd happily pivot to this approach so
>> that we can avoid the functi
On Thu, 4 Apr 2024 at 01:50, Nathan Bossart wrote:
> If we can verify this approach won't cause segfaults and can stomach the
> regression between 8 and 16 bytes, I'd happily pivot to this approach so
> that we can avoid the function call dance that I have in v25.
The approach I posted does not r
On Thu, 4 Apr 2024 at 11:50, Nathan Bossart wrote:
> If we can verify this approach won't cause segfaults and can stomach the
> regression between 8 and 16 bytes, I'd happily pivot to this approach so
> that we can avoid the function call dance that I have in v25.
>
> Thoughts?
If we're worried a
On Tue, Apr 02, 2024 at 11:30:39PM +0300, Ants Aasma wrote:
> On Tue, 2 Apr 2024 at 00:31, Nathan Bossart wrote:
>> On Tue, Apr 02, 2024 at 12:11:59AM +0300, Ants Aasma wrote:
>> > What about using the masking capabilities of AVX-512 to handle the
>> > tail in the same code path? Masked out portio
On Wed, Apr 03, 2024 at 12:41:27PM -0500, Nathan Bossart wrote:
> I committed v23-0001. Here is a rebased version of the remaining patches.
> I intend to test the masking idea from Ants next.
0002 was missing a cast that is needed for the 32-bit builds. I've fixed
that in v25.
--
Nathan Bossar
I committed v23-0001. Here is a rebased version of the remaining patches.
I intend to test the masking idea from Ants next.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 295b03530de5f42fe876b4489191da2f8dc83194 Mon Sep 17 00:00:00 2001
From: Nathan Bossart
Date: Wed, 27 Ma
On Tue, Apr 02, 2024 at 05:20:20PM -0500, Nathan Bossart wrote:
> Sorry for the noise. I noticed a couple of silly mistakes immediately
> after sending v21.
Sigh... I missed a line while rebasing these patches, which seems to have
grossly offended cfbot. Apologies again for the noise.
--
Nath
On Tue, Apr 02, 2024 at 05:01:32PM -0500, Nathan Bossart wrote:
> In v21, 0001 is just the above inlining idea, which seems worth doing
> independent of $SUBJECT. 0002 and 0003 are the AVX-512 patches, which I've
> modified similarly to 0001, i.e., I've inlined the "fast" version in the
> function
On Tue, Apr 02, 2024 at 01:40:21PM -0500, Nathan Bossart wrote:
> On Tue, Apr 02, 2024 at 01:43:48PM -0400, Tom Lane wrote:
>> I don't like the double evaluation of the macro argument. Seems like
>> you could get the same results more safely with
>>
>> static inline uint64
>> pg_popcoun
On Tue, 2 Apr 2024 at 00:31, Nathan Bossart wrote:
> On Tue, Apr 02, 2024 at 12:11:59AM +0300, Ants Aasma wrote:
> > What about using the masking capabilities of AVX-512 to handle the
> > tail in the same code path? Masked out portions of a load instruction
> > will not generate an exception. To a
On Tue, Apr 02, 2024 at 01:43:48PM -0400, Tom Lane wrote:
> Alvaro Herrera writes:
>> On 2024-Apr-02, Nathan Bossart wrote:
>>> Another idea I had is to turn pg_popcount() into a macro that just uses the
>>> pg_number_of_ones array when called for few bytes:
>>>
>>> static inline uint64
>>>
Alvaro Herrera writes:
> On 2024-Apr-02, Nathan Bossart wrote:
>> Another idea I had is to turn pg_popcount() into a macro that just uses the
>> pg_number_of_ones array when called for few bytes:
>>
>> static inline uint64
>> pg_popcount_inline(const char *buf, int bytes)
>> {
>>
On 2024-Apr-02, Nathan Bossart wrote:
> Another idea I had is to turn pg_popcount() into a macro that just uses the
> pg_number_of_ones array when called for few bytes:
>
> static inline uint64
> pg_popcount_inline(const char *buf, int bytes)
> {
> uint64
On Mon, Apr 01, 2024 at 05:11:17PM -0500, Nathan Bossart wrote:
> Here is a v19 of the patch set. I moved out the refactoring of the
> function pointer selection code to 0001. I think this is a good change
> independent of $SUBJECT, and I plan to commit this soon. In 0002, I
> changed the syslog
On Tue, Apr 02, 2024 at 01:09:57AM +0300, Ants Aasma wrote:
> On Tue, 2 Apr 2024 at 00:31, Nathan Bossart wrote:
>> On Tue, Apr 02, 2024 at 12:11:59AM +0300, Ants Aasma wrote:
>> > What about using the masking capabilities of AVX-512 to handle the
>> > tail in the same code path? Masked out portio
Here is a v19 of the patch set. I moved out the refactoring of the
function pointer selection code to 0001. I think this is a good change
independent of $SUBJECT, and I plan to commit this soon. In 0002, I
changed the syslogger.c usage of pg_popcount() to use pg_number_of_ones
instead. This is
On Tue, 2 Apr 2024 at 00:31, Nathan Bossart wrote:
>
> On Tue, Apr 02, 2024 at 12:11:59AM +0300, Ants Aasma wrote:
> > What about using the masking capabilities of AVX-512 to handle the
> > tail in the same code path? Masked out portions of a load instruction
> > will not generate an exception. To
On Tue, Apr 02, 2024 at 12:11:59AM +0300, Ants Aasma wrote:
> What about using the masking capabilities of AVX-512 to handle the
> tail in the same code path? Masked out portions of a load instruction
> will not generate an exception. To allow byte level granularity
> masking, -mavx512bw is needed.
On Mon, 1 Apr 2024 at 18:53, Nathan Bossart wrote:
>
> On Mon, Apr 01, 2024 at 01:06:12PM +0200, Alvaro Herrera wrote:
> > On 2024-Mar-31, Nathan Bossart wrote:
> >> +popcnt = _mm512_reduce_add_epi64(accum);
> >> +return popcnt + pg_popcount_fast(buf, bytes);
> >
> > Hmm, doesn't this arra
On Mon, Apr 01, 2024 at 01:06:12PM +0200, Alvaro Herrera wrote:
> On 2024-Mar-31, Nathan Bossart wrote:
>> +popcnt = _mm512_reduce_add_epi64(accum);
>> +return popcnt + pg_popcount_fast(buf, bytes);
>
> Hmm, doesn't this arrangement cause an extra function call to
> pg_popcount_fast to be
On 2024-Mar-31, Nathan Bossart wrote:
> +uint64
> +pg_popcount_avx512(const char *buf, int bytes)
> +{
> + uint64 popcnt;
> + __m512i accum = _mm512_setzero_si512();
> +
> + for (; bytes >= sizeof(__m512i); bytes -= sizeof(__m512i))
> + {
> + const
On Sat, Mar 30, 2024 at 03:03:29PM -0500, Nathan Bossart wrote:
> My current plan is to add some new tests for
> pg_popcount() with many bytes, and then I'll give it a few more days for
> any additional feedback before committing.
Here is a v18 with a couple of new tests. Otherwise, it is the sam
I used John Naylor's test_popcount module [0] to put together the attached
graphs (note that the "small arrays" one is semi-logarithmic). For both
graphs, the X-axis is the number of 64-bit words in the array, and Y-axis
is the amount of time in milliseconds to run pg_popcount() on it 100,000
time
Here's a v17 of the patch. This one has configure checks for everything
(i.e., CPUID, XGETBV, and the AVX512 intrinsics) as well as the relevant
runtime checks (i.e., we call CPUID to check for XGETBV and AVX512 POPCNT
availability, and we call XGETBV to ensure the ZMM registers are enabled).
I re
On Fri, Mar 29, 2024 at 03:08:28PM -0500, Nathan Bossart wrote:
>> +#if defined(HAVE__GET_CPUID)
>> +__get_cpuid_count(7, 0, &exx[0], &exx[1], &exx[2], &exx[3]);
>> +#elif defined(HAVE__CPUID)
>> +__cpuidex(exx, 7, 0);
>
> Is there any reason we can't use __get_cpuid() and __cpuid() here,
On Fri, Mar 29, 2024 at 02:13:12PM -0500, Nathan Bossart wrote:
> * If the compiler understands AVX512 intrinsics, we assume that it also
> knows about the required CPUID and XGETBV intrinsics, and we assume that
> the conditions for TRY_POPCNT_FAST are true.
Bleh, cfbot's 32-bit build is unha
Okay, here is a slightly different approach that I've dubbed the "maximum
assumption" approach. In short, I wanted to see how much we could simplify
the patch by making all possibly-reasonable assumptions about the compiler
and CPU. These include:
* If the compiler understands AVX512 intrinsics,
> A counterexample is the CRC32C code. AFAICT we assume the presence of
> CPUID in that code (and #error otherwise). I imagine its probably safe to
> assume the compiler understands CPUID if it understands AVX512 intrinsics,
> but that is still mostly a guess.
If AVX-512 intrinsics are available
> On Thu, Mar 28, 2024 at 11:10:33PM +0100, Alvaro Herrera wrote:
> > We don't do MSVC via autoconf/Make. We used to have a special build
> > framework for MSVC which parsed Makefiles to produce "solution" files,
> > but it was removed as soon as Meson was mature enough to build. See
> > commit 1
On Fri, Mar 29, 2024 at 12:30:14PM -0400, Tom Lane wrote:
> Nathan Bossart writes:
>>> I see google web references to the xgetbv instruction as far back as 2009
>>> for Intel 64 bit HW and 2010 for AMD 64bit HW, maybe you could test for
>>> _xgetbv() MSVC built-in. How far back do you need to go?
> From: Nathan Bossart
> Sent: Friday, March 29, 2024 9:17 AM
> To: Amonson, Paul D
> On Fri, Mar 29, 2024 at 04:06:17PM +, Amonson, Paul D wrote:
>> Yeah, I understand that much, but I want to know how portable the
>> XGETBV instruction is. Unless I can assume that all x86_64 systems
>>
Nathan Bossart writes:
>> I see google web references to the xgetbv instruction as far back as 2009
>> for Intel 64 bit HW and 2010 for AMD 64bit HW, maybe you could test for
>> _xgetbv() MSVC built-in. How far back do you need to go?
> Hm. It seems unlikely that a compiler would understand AVX5
On Fri, Mar 29, 2024 at 10:59:40AM -0500, Nathan Bossart wrote:
> It might be nice if we conditionally built pg_popcount_avx512.o in autoconf
> builds, too, but AFAICT we still need to wrap most of that code with
> macros, so I'm not sure it's worth the trouble. I'll take another look at
> this...
On Fri, Mar 29, 2024 at 04:06:17PM +, Amonson, Paul D wrote:
>> Yeah, I understand that much, but I want to know how portable the XGETBV
>> instruction is. Unless I can assume that all x86_64 systems and compilers
>> support that instruction, we might need an additional configure check and/or
> -Original Message-
>
> Cool. I think we should run the benchmarks again to be safe, though.
Ok, sure go ahead. :)
> >> I forgot to mention that I also want to understand whether we can
> >> actually assume availability of XGETBV when CPUID says we support
> >> AVX512:
> >
> > You canno
On Thu, Mar 28, 2024 at 10:29:47PM +, Amonson, Paul D wrote:
> I see in the meson.build you added the new file twice?
>
> @@ -7,6 +7,7 @@ pgport_sources = [
>'noblock.c',
>'path.c',
>'pg_bitutils.c',
> + 'pg_popcount_avx512.c',
>'pg_strong_random.c',
>'pgcheckdir.c',
>
On Thu, Mar 28, 2024 at 11:10:33PM +0100, Alvaro Herrera wrote:
> We don't do MSVC via autoconf/Make. We used to have a special build
> framework for MSVC which parsed Makefiles to produce "solution" files,
> but it was removed as soon as Meson was mature enough to build. See
> commit 1301c80b216
On Thu, Mar 28, 2024 at 10:03:04PM +, Amonson, Paul D wrote:
>> * I think we need to verify there isn't a huge performance regression for
>> smaller arrays. IIUC those will still require an AVX512 instruction or
>> two as well as a function call, which might add some noticeable overhead.
>
> -Original Message-
> From: Amonson, Paul D
> Sent: Thursday, March 28, 2024 3:03 PM
> To: Nathan Bossart
> ...
> I will review the new patch to see if there are anything that jumps out at me.
I see in the meson.build you added the new file twice?
@@ -7,6 +7,7 @@ pgport_sources = [
On 2024-Mar-28, Amonson, Paul D wrote:
> > -Original Message-
> > From: Nathan Bossart
> > Sent: Thursday, March 28, 2024 2:39 PM
> > To: Amonson, Paul D
> >
> > * The latest patch set from Paul Amonson appeared to support MSVC in the
> > meson build, but not the autoconf one. I don'
> -Original Message-
> From: Nathan Bossart
> Sent: Thursday, March 28, 2024 2:39 PM
> To: Amonson, Paul D
>
> * The latest patch set from Paul Amonson appeared to support MSVC in the
> meson build, but not the autoconf one. I don't have much expertise here,
> so the v14 patch doesn
On Thu, Mar 28, 2024 at 04:38:54PM -0500, Nathan Bossart wrote:
> Here is a v14 of the patch that I think is beginning to approach something
> committable. Besides general review and testing, there are two things that
> I'd like to bring up:
>
> * The latest patch set from Paul Amonson appeared t
Here is a v14 of the patch that I think is beginning to approach something
committable. Besides general review and testing, there are two things that
I'd like to bring up:
* The latest patch set from Paul Amonson appeared to support MSVC in the
meson build, but not the autoconf one. I don't ha
> -Original Message-
> From: Nathan Bossart
> Sent: Wednesday, March 27, 2024 3:00 PM
> To: Amonson, Paul D
>
> ... (I realize that I'm essentially
> recanting much of my previous feedback, which I apologize for.)
It happens. LOL As long as the algorithm for AVX-512 is not altered I am
On Mon, Mar 25, 2024 at 03:05:51PM -0500, Nathan Bossart wrote:
> On Mon, Mar 25, 2024 at 06:42:36PM +, Amonson, Paul D wrote:
>> Ok, CI turned green after my re-post of the patches. Can this please get
>> merged?
>
> Thanks for the new patches. I intend to take another look soon.
Thanks fo
On Mon, Mar 25, 2024 at 06:42:36PM +, Amonson, Paul D wrote:
> Ok, CI turned green after my re-post of the patches. Can this please get
> merged?
Thanks for the new patches. I intend to take another look soon.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
s.postgresql.org
> Subject: RE: Popcount optimization using AVX512
>
Ok, CI turned green after my re-post of the patches. Can this please get
merged?
Thanks,
Paul
On 3/25/24 11:12, Tom Lane wrote:
"Amonson, Paul D" writes:
I am re-posting the patches as CI for Mac failed (CI error not code/test
error). The patches are the same as last time.
Just for a note --- the cfbot will re-test existing patches every
so often without needing a bump. The current
1 - 100 of 171 matches
Mail list logo