RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-09-24 Thread Amonson, Paul D
Hi all, I will be retiring from Intel at the end of this week. I wanted to introduce the engineer who will be taking over the CRC32c proposal and commit fest entry. Devulapalli, Raghuveer I have brought him up to speed and he will be the go-to for technical review comments and questions. Plea

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-27 Thread Amonson, Paul D
> Things like sizeof() and offsetof() are known at compile time, so the compiler > will recognize when a condition is always true or false and optimize it out > accordingly. In cases where the value cannot be known at compile time, > checking the length in the macro and dispatching to a different

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-26 Thread Amonson, Paul D
> IMHO that would be useful to establish the current state of the patch set from > a performance standpoint, especially since you've added code intended to > mitigate the regression. Ok. > +#define COMP_CRC32C_SMALL(crc, data, len) \ > + ((crc) = pg_comp_crc32c_sse42((crc), (data), (len))) >

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-26 Thread Amonson, Paul D
> And this still shows the ~14% regression in your original post? At the small buffer sizes the margin of error or "noise" is larger, 7-11%. My average could be just bad luck. It will take me a while to re-setup for full data collection runs but I can try it again if you like. Paul

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-26 Thread Amonson, Paul D
> I'm curious about where exactly the regression is coming from. Is it possible > that your build for the SSE 4.2 tests was using it unconditionally, i.e., > optimizing away the function pointer? I am calling the SSE 4.2 implementation directly; I am not even building the pg_sse42_*_choose.c fil

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-26 Thread Amonson, Paul D
> Upthread [0], Andres suggested dispatching to a different implementation for > compile-time-known small lengths. Have you looked into that? In your > original post, you noted a 14% regression for records smaller than 256 bytes, > which is not an uncommon case for Postgres. IMO we should try to

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-22 Thread Amonson, Paul D
> Upthread [0], Andres suggested dispatching to a different implementation for > compile-time-known small lengths. Have you looked into that? In your > original post, you noted a 14% regression for records smaller than 256 bytes, > which is not an uncommon case for Postgres. IMO we should try to

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-22 Thread Amonson, Paul D
Hi, Here are the latest patches for the accelerated CRC32c algorithm. I did the following to create these refactored patches: 1) From the main branch I moved all x86_64 hardware checks from the various locations into a single location. I did not move any ARM tests as I would have no way to tes

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-07-18 Thread Amonson, Paul D
> Okay, that is very interesting. Yes, we will have no problem reproducing the > exact license text in the source code. I think we can remove the license > issue > as a blocker for this patch. Hi, I was wondering if I can I get a review please. I am interested in the refactor question for the

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-25 Thread Amonson, Paul D
> It would be good to know exactly what, if any, changes the Intel lawyers want > us to make to our license if we accept this patch. I asked about this and there is nothing Intel requires here license wise. They believe that there is nothing wrong with including Clause-3 BSD like licenses under

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-18 Thread Amonson, Paul D
> Hmm, I wonder if the "(c) 2024 Intel" line is going to bring us trouble. > (I bet it's not really necessary anyway.) Our lawyer agrees, copyright is covered by the "PostgreSQL Global Development Group" copyright line as a contributor. > And this bit doesn't look good. The LICENSE file says: .

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-17 Thread Amonson, Paul D
> This is extremely workload dependent, it's not hard to find workloads with > lots of very small record and very few big ones... What you observed might > have "just" been the warmup behaviour where more full page writes have to > be written. Can you tell me how to avoid capturing this "warm-up"

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-12 Thread Amonson, Paul D
> -Original Message- > From: Andres Freund > Sent: Wednesday, June 12, 2024 1:12 PM > To: Amonson, Paul D > FWIW, I tried the v2 patch on my Xeon Gold 5215 workstation, and dies early > on with SIGILL: Nice catch!!! I was testing the bit for the vpclmulqdq in

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-12 Thread Amonson, Paul D
> The project is currently in feature-freeze in preparation for the next major > release so new development and ideas are not the top priority right now. > Additionally there is a large developer meeting shortly which many are busy > preparing for. Excercise some patience, and I'm sure there will

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-05-17 Thread Amonson, Paul D
Hi, forgive the top-post but I have not seen any response to this post? Thanks, Paul > -Original Message- > From: Amonson, Paul D > Sent: Wednesday, May 1, 2024 8:56 AM > To: pgsql-hackers@lists.postgresql.org > Cc: Nathan Bossart ; Shankaran, Akash > > Subject:

Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-05-01 Thread Amonson, Paul D
Hi, Comparing the current SSE4.2 implementation of the CRC32C algorithm in Postgres, to an optimized AVX-512 algorithm [0] we observed significant gains. The result was a ~6.6X average multiplier of increased performance measured on 3 different Intel products. Details below. The AVX-512 algorit

RE: Popcount optimization using AVX512

2024-03-29 Thread Amonson, Paul D
> A counterexample is the CRC32C code. AFAICT we assume the presence of > CPUID in that code (and #error otherwise). I imagine its probably safe to > assume the compiler understands CPUID if it understands AVX512 intrinsics, > but that is still mostly a guess. If AVX-512 intrinsics are available

RE: Popcount optimization using AVX512

2024-03-29 Thread Amonson, Paul D
> On Thu, Mar 28, 2024 at 11:10:33PM +0100, Alvaro Herrera wrote: > > We don't do MSVC via autoconf/Make. We used to have a special build > > framework for MSVC which parsed Makefiles to produce "solution" files, > > but it was removed as soon as Meson was mature enough to build. See > > commit 1

RE: Popcount optimization using AVX512

2024-03-29 Thread Amonson, Paul D
> -Original Message- > > Cool. I think we should run the benchmarks again to be safe, though. Ok, sure go ahead. :) > >> I forgot to mention that I also want to understand whether we can > >> actually assume availability of XGETBV when CPUID says we support > >> AVX512: > > > > You canno

RE: Popcount optimization using AVX512

2024-03-28 Thread Amonson, Paul D
> -Original Message- > From: Amonson, Paul D > Sent: Thursday, March 28, 2024 3:03 PM > To: Nathan Bossart > ... > I will review the new patch to see if there are anything that jumps out at me. I see in the meson.build you added the new file twice? @@ -7,6 +7,7

RE: Popcount optimization using AVX512

2024-03-28 Thread Amonson, Paul D
> -Original Message- > From: Nathan Bossart > Sent: Thursday, March 28, 2024 2:39 PM > To: Amonson, Paul D > > * The latest patch set from Paul Amonson appeared to support MSVC in the > meson build, but not the autoconf one. I don't have much expertise

RE: Popcount optimization using AVX512

2024-03-27 Thread Amonson, Paul D
> -Original Message- > From: Nathan Bossart > Sent: Wednesday, March 27, 2024 3:00 PM > To: Amonson, Paul D > > ... (I realize that I'm essentially > recanting much of my previous feedback, which I apologize for.) It happens. LOL As long as the algorithm fo

RE: Popcount optimization using AVX512

2024-03-25 Thread Amonson, Paul D
> -Original Message- > From: Amonson, Paul D > Sent: Monday, March 25, 2024 8:20 AM > To: Tom Lane > Cc: David Rowley ; Nathan Bossart > ; Andres Freund ; Alvaro > Herrera ; Shankaran, Akash > ; Noah Misch ; Matthias > van de Meent ; pgsql- > hack...@list

RE: Popcount optimization using AVX512

2024-03-25 Thread Amonson, Paul D
> -Original Message- > From: Tom Lane > Sent: Monday, March 25, 2024 8:12 AM > To: Amonson, Paul D > Cc: David Rowley ; Nathan Bossart > Subject: Re: Popcount optimization using AVX512 >... > Just for a note --- the cfbot will re-test existing patches every so of

RE: Popcount optimization using AVX512

2024-03-25 Thread Amonson, Paul D
> -Original Message- > From: Amonson, Paul D > Sent: Thursday, March 21, 2024 12:18 PM > To: David Rowley > Cc: Nathan Bossart ; Andres Freund I am re-posting the patches as CI for Mac failed (CI error not code/test error). The patches are the same as last time. Than

RE: Popcount optimization using AVX512

2024-03-21 Thread Amonson, Paul D
> -Original Message- > From: David Rowley > Sent: Wednesday, March 20, 2024 5:28 PM > To: Amonson, Paul D > Cc: Nathan Bossart ; Andres Freund > > I'm not sure about this "extern negates inline" comment. It seems to me the > compiler is perfectly f

RE: Popcount optimization using AVX512

2024-03-20 Thread Amonson, Paul D
> -Original Message- > From: David Rowley > Sent: Tuesday, March 19, 2024 9:26 PM > To: Amonson, Paul D > > AMD's Zen4 also has AVX512, so it's misleading to indicate it's an Intel only > instruction. Also, writing the date isn't necessary as

RE: Popcount optimization using AVX512

2024-03-19 Thread Amonson, Paul D
> -Original Message- > From: Nathan Bossart > > Committed. Thanks for the suggestion and for reviewing! > > Paul, I suspect your patches will need to be rebased after commit cc4826d. > Would you mind doing so? Changed in this patch set. * Rebased. * Direct *slow* calls via macros as sh

RE: Popcount optimization using AVX512

2024-03-18 Thread Amonson, Paul D
> -Original Message- > From: Nathan Bossart > Sent: Monday, March 18, 2024 2:08 PM > To: David Rowley > Cc: Amonson, Paul D ; Andres Freund >... > > The only reason I left it out was because I couldn't convince myself that it > wasn't dead code, give

RE: Popcount optimization using AVX512

2024-03-18 Thread Amonson, Paul D
> -Original Message- > From: Nathan Bossart > Sent: Monday, March 18, 2024 9:20 AM > ... > I don't think David was suggesting that we need to remove the runtime checks > for AVX512. IIUC he was pointing out that most of the performance gain is > from removing the function call overhead, w

RE: Popcount optimization using AVX512

2024-03-18 Thread Amonson, Paul D
om: Nathan Bossart > Sent: Monday, March 18, 2024 8:29 AM > To: David Rowley > Cc: Amonson, Paul D ; Andres Freund > ; Alvaro Herrera ; Shankaran, > Akash ; Noah Misch ; > Tom Lane ; Matthias van de Meent > ; pgsql-hackers@lists.postgresql.org > Subject: Re: Popcount optimizati

RE: Popcount optimization using AVX512

2024-03-15 Thread Amonson, Paul D
> -Original Message- > From: Amonson, Paul D > Sent: Friday, March 15, 2024 8:31 AM > To: Nathan Bossart ... > When I tested the code outside postgres in a micro benchmark I got 200- > 300% improvements. Your results are interesting, as it implies more than > 300% i

RE: Popcount optimization using AVX512

2024-03-15 Thread Amonson, Paul D
> -Original Message- > From: Nathan Bossart > Sent: Friday, March 15, 2024 8:06 AM > To: Amonson, Paul D > Cc: Andres Freund ; Alvaro Herrera ip.org>; Shankaran, Akash ; Noah Misch > ; Tom Lane ; Matthias van de > Meent ; pgsql- > hack...@lists.postgresql.

RE: Popcount optimization using AVX512

2024-03-14 Thread Amonson, Paul D
> -Original Message- > From: Nathan Bossart > Sent: Monday, March 11, 2024 6:35 PM > To: Amonson, Paul D > Thanks. There's no need to wait to post the AVX portion. I recommend using > "git format-patch" to construct the patch set for the lists.

RE: Popcount optimization using AVX512

2024-03-13 Thread Amonson, Paul D
> -Original Message- > From: Nathan Bossart > Sent: Wednesday, March 13, 2024 9:39 AM > To: Amonson, Paul D > +extern int pg_popcount32_slow(uint32 word); extern int > +pg_popcount64_slow(uint64 word); > > +/* In pg_popcnt_*_accel source file. */ extern i

RE: Popcount optimization using AVX512

2024-03-11 Thread Amonson, Paul D
> -Original Message- > From: Nathan Bossart > Sent: Thursday, March 7, 2024 1:36 PM > Subject: Re: Popcount optimization using AVX512 I will be splitting the request into 2 patches. I am attaching the first patch (refactoring only) and I updated the commitfest entry to match this patch.

RE: Popcount optimization using AVX512

2024-03-05 Thread Amonson, Paul D
-Original Message- >From: Nathan Bossart >Sent: Tuesday, March 5, 2024 8:38 AM >To: Amonson, Paul D >Cc: Andres Freund ; Alvaro Herrera >; Shankaran, Akash ; Noah >Misch ; Tom Lane ; Matthias van de >Meent ; >pgsql-hackers@lists.postgresql.org >Subject

RE: Popcount optimization using AVX512

2024-03-05 Thread Amonson, Paul D
apply and build. It succeeded. Thanks, Paul -Original Message- From: Nathan Bossart Sent: Monday, March 4, 2024 2:21 PM To: Amonson, Paul D Cc: Andres Freund ; Alvaro Herrera ; Shankaran, Akash ; Noah Misch ; Tom Lane ; Matthias van de Meent ; pgsql-hackers@lists.postgresql.org S

RE: Popcount optimization using AVX512

2024-03-04 Thread Amonson, Paul D
an be picked up by a committer, given it has been reviewed by multiple committers so far? The scope of the change is pretty contained as well. [0] https://wiki.postgresql.org/wiki/Submitting_a_Patch Thanks, Paul -Original Message- From: Nathan Bossart Sent: Friday, March 1, 2024 1

RE: Popcount optimization using AVX512

2024-02-27 Thread Amonson, Paul D
. Both meson and autoconf are updated with the new refactor. I am attaching the new patch. Paul -Original Message- From: Amonson, Paul D Sent: Monday, February 26, 2024 9:57 AM To: Amonson, Paul D ; Andres Freund Cc: Alvaro Herrera ; Shankaran, Akash ; Nathan Bossart ; Noah Misch

RE: Popcount optimization using AVX512

2024-02-26 Thread Amonson, Paul D
. Can someone with Windows/MSVC experience help me? * Code: https://github.com/paul-amonson/postgresql/tree/popcnt_patch * CI build: https://cirrus-ci.com/task/4927666021728256 Thanks, Paul -Original Message- From: Amonson, Paul D Sent: Wednesday, February 21, 2024 9:36 AM To: Andres F

RE: Popcount optimization using AVX512

2024-02-21 Thread Amonson, Paul D
ild is at https://cirrus-ci.com/task/4927666021728256. Thanks, Paul -Original Message- From: Andres Freund Sent: Monday, February 12, 2024 12:37 PM To: Amonson, Paul D Cc: Alvaro Herrera ; Shankaran, Akash ; Nathan Bossart ; Noah Misch ; Tom Lane ; Matthias van de Meent

RE: Popcount optimization using AVX512

2024-02-12 Thread Amonson, Paul D
64/512bit x86 implementations). I'm not an expert in meson, but splitting might add complexity to meson.build. Could you elaborate if there are other benefits to the split file approach? Paul -Original Message- From: Andres Freund Sent: Friday, February 9, 2024 10:35 AM To: Amons

RE: Popcount optimization using AVX512

2024-02-09 Thread Amonson, Paul D
OS or hypervisor even if the CPU supports AVX512. The big change is adding all old and new build support to meson. I am new to meson/ninja so please review carefully. Thanks, Paul -Original Message- From: Alvaro Herrera Sent: Wednesday, February 7, 2024 2:13 AM To: Amonson, Paul D Cc

RE: Popcount optimization using AVX512

2024-02-06 Thread Amonson, Paul D
1:49 AM To: Shankaran, Akash Cc: Nathan Bossart ; Noah Misch ; Amonson, Paul D ; Tom Lane ; Matthias van de Meent ; pgsql-hackers@lists.postgresql.org Subject: Re: Popcount optimization using AVX512 On 2024-Jan-25, Shankaran, Akash wrote: > With the updated patch, we observed significant im