Re: Using pg_bitutils.h in tidbitmap.c.

2025-04-23 Thread John Naylor
ly of the bitscan, which is 3 or 4 cycles on modern hardware, but like you said, I'm not sure if that matters. -- John Naylor Amazon Web Services

Re: Feature freeze

2025-04-09 Thread John Naylor
On Tue, Apr 8, 2025 at 10:13 PM Daniel Gustafsson wrote: > > I find both of the above needlessly confusing when we instead could use UTC > which is a more universally understood concept. Indeed, that's what the "U" stands for, after all. :-) -- John Naylor Amazon Web Services

Re: Improve CRC32C performance on SSE4.2

2025-04-06 Thread John Naylor
ning the smoke test! I fixed that, made a couple more tiny comment changes and pushed. -- John Naylor Amazon Web Services

Re: Improve CRC32C performance on SSE4.2

2025-04-02 Thread John Naylor
On Tue, Apr 1, 2025 at 11:25 PM Nathan Bossart wrote: > > On Tue, Apr 01, 2025 at 05:33:02PM +0700, John Naylor wrote: > > On Thu, Mar 27, 2025 at 2:55 AM Devulapalli, Raghuveer > > wrote: > >> (2) Might be apt to rename pg_crc32c_sse42*.c to pg_crc32c_x86*.c s

Re: CRC32C Parallel Computation Optimization on ARM

2025-04-01 Thread John Naylor
entry Returned with Feedback. -- John Naylor Amazon Web Services

Re: Improve CRC32C performance on SSE4.2

2025-04-01 Thread John Naylor
y when talking about specific intrinsics and prefer "AVX-512" elsewhere, to head off potential future confusion with Arm PMULL. -- John Naylor Amazon Web Services From b3af802cf28cdc0937e163dbba005f823d74e0d0 Mon Sep 17 00:00:00 2001 From: John Naylor Date: Tue, 25 Mar 2025 19:22:32 +070

Re: [PATCH] SVE popcount support

2025-03-27 Thread John Naylor
24: error: call to 'svwhilelt_b8' is ambiguous; argument 1 has type 'int32_t' but argument 2 has type 'uint64_t' 29 | pred = svwhilelt_b8(0, sizeof(buf)); |^~~~ Compiler returned: 1 ``` ...Changing it to pred = svw

Re: Improve CRC32C performance on SSE4.2

2025-03-25 Thread John Naylor
On Mon, Mar 24, 2025 at 6:37 PM John Naylor wrote: > > I'll take a look at the configure > checks soon, since I had some questions there. One other thing I forgot to mention: The previous test function had local constants that the compiler was able to fold, resulting in no

Re: Improve CRC32C performance on SSE4.2

2025-03-25 Thread John Naylor
On Mon, Mar 24, 2025 at 6:37 PM John Naylor wrote: > I'll take a look at the configure > checks soon, since I had some questions there. I'm leaning towards a length limit for v15-0001 so that inlined instructions are likely to be unrolled. Aside from lack of commit message, I t

Re: [PATCH] SVE popcount support

2025-03-24 Thread John Naylor
ore, but I'm confused that the loops are unrolled in the link-test functions as well. > * For both Neon and SVE, I do see improvements with looping over 4 > registers at a time, so IMHO it's worth doing so even if it performs the > same as 2-register blocks on some hardware. I wonder if alignment matters for these larger blocks. -- John Naylor Amazon Web Services

Re: Improve CRC32C performance on SSE4.2

2025-03-24 Thread John Naylor
4 9.547 6.095 ... > 256 31.399 10.035 Thanks for testing! Looks good. I'll take a look at the configure checks soon, since I had some questions there. -- John Naylor Amazon Web Services

Re: CRC32C Parallel Computation Optimization on ARM

2025-03-18 Thread John Naylor
of renaming it to *_common.c or perhaps *_fallback.c , since the addition from this patch is still kind of a fallback where we won't have the hardware needed for faster algorithms, as discussed elsewhere. 0002-3 puts the relevant parts into a header so that the hardware details can be

Re: Not-terribly-safe checks for CRC intrinsic support

2025-03-17 Thread John Naylor
o be explaining the choice well enough. > BTW, it looks to me like PGAC_AVX512_POPCNT_INTRINSICS is at similar > hazard, but I'm not entirely sure how to fix that one. "buf" is the variable there that we're loading from, so that would be the one to make global. -- John Naylor Amazon Web Services

Re: vacuumdb changes for stats import/export

2025-03-15 Thread John Naylor
On Fri, Mar 7, 2025 at 4:47 AM Nathan Bossart wrote: > > On Thu, Mar 06, 2025 at 06:30:59PM +0700, John Naylor wrote: > > IIUC correctly, pg_statistic doesn't store stats on itself, so this > > causes the query result to always contain pg_statistic -- does that &g

Re: vacuumdb changes for stats import/export

2025-03-15 Thread John Naylor
On Wed, Mar 12, 2025 at 12:00 AM Nathan Bossart wrote: > > On Mon, Mar 10, 2025 at 10:08:49AM -0500, Nathan Bossart wrote: > > On Mon, Mar 10, 2025 at 12:35:22PM +0700, John Naylor wrote: > >> I have no further comments. > > > > Thanks. I'll give thi

Re: Improve CRC32C performance on SSE4.2

2025-03-15 Thread John Naylor
ned upthread, the 128-bit implementation regresses on Zen 2 up to at least 256 bytes. -- John Naylor Amazon Web Services

Re: CRC32C Parallel Computation Optimization on ARM

2025-03-11 Thread John Naylor
and I'd like to give that author credit for initiating that work, as long as there is no legal issue with that: https://www.postgresql.org/message-id/db9pr08mb6991329a73923bf8ed4b3422f5...@db9pr08mb6991.eurprd08.prod.outlook.com -- John Naylor Amazon Web Services

Re: Improve CRC32C performance on SSE4.2

2025-03-11 Thread John Naylor
On Wed, Mar 5, 2025 at 10:52 PM Nathan Bossart wrote: > > On Wed, Mar 05, 2025 at 08:51:21AM +0700, John Naylor wrote: > > That was my hunch too, but I wanted to be more sure, so I modified the > > benchmark so it doesn't know the address of the next calculation until &

Re: maintenance_work_mem = 64kB doesn't work for vacuum

2025-03-11 Thread John Naylor
hat test never got committed. In any case I found it worked back in July: https://www.postgresql.org/message-id/CANWCAZZb7wd403wHQQUJZjkF%2BRWKAAa%2BWARP0Rj0EyMcfcdN9Q%40mail.gmail.com -- John Naylor Amazon Web Services

Re: Improve CRC32C performance on SSE4.2

2025-03-11 Thread John Naylor
ed so that the CF bot can't find it, since it breaks the tests in the original perf test (It's not for commit anyway). Adding back AVX-512 should be fairly mechanical, since Raghuveer and Nathan have already done the work needed for that. -- John Naylor Amazon Web Services From 298cbb2

Re: Improve CRC32C performance on SSE4.2

2025-03-11 Thread John Naylor
On Tue, Mar 11, 2025 at 4:47 AM Nathan Bossart wrote: > > On Mon, Mar 10, 2025 at 03:48:31PM +0700, John Naylor wrote: > > On Tue, Mar 4, 2025 at 2:11 AM Nathan Bossart > > wrote: > >> Overall, I wish we could avoid splitting things into separate files and &

Re: CRC32C Parallel Computation Optimization on ARM

2025-03-10 Thread John Naylor
r name and the patents cited therein here: https://www.postgresql.org/message-id/CANWCAZbkt89_fVAaCAGBMznwA_xh%3D2Ci5q4GZytZHKjZAEjCRQ%40mail.gmail.com -- John Naylor Amazon Web Services

Re: Doc fix of aggressive vacuum threshold for multixact members storage

2025-03-06 Thread John Naylor
On Wed, Mar 5, 2025 at 12:06 PM Alex Friedman wrote: > > Good points, thank you. I'm good with going ahead as you've suggested. Pushed, thanks for the patch! -- John Naylor Amazon Web Services

Re: vacuumdb changes for stats import/export

2025-03-06 Thread John Naylor
On Wed, Mar 5, 2025 at 12:13 AM Nathan Bossart wrote: > > On Tue, Mar 04, 2025 at 01:05:17PM +0700, John Naylor wrote: > > On Mon, Mar 3, 2025 at 11:21 PM Nathan Bossart > > wrote: > >> I did that in v3. I also tried to break up this comment into bullet points >

Re: Improve CRC32C performance on SSE4.2

2025-03-04 Thread John Naylor
On Wed, Mar 5, 2025 at 12:36 AM Nathan Bossart wrote: > > On Tue, Mar 04, 2025 at 12:09:09PM +0700, John Naylor wrote: > > On Tue, Mar 4, 2025 at 2:11 AM Nathan Bossart > > wrote: > >> This could potentially lead to a small regression for machines with SSE > >&

Re: Doc fix of aggressive vacuum threshold for multixact members storage

2025-03-04 Thread John Naylor
n't really predict how the code will change, and a doc-update reminder here seems like closing the door after the horses have left the barn. -- John Naylor Amazon Web Services

reduce overhead in shared memory TID store

2025-03-04 Thread John Naylor
saved local pointers. We could expand that concept, but it'd be invasive and unreliable. There are other things we can try, and I'll update the thread as I find them. -- John Naylor Amazon Web Services From f0bed5ceb72c34a9ad541976247d0ae2b88d17d8 Mon Sep 17 00:00:00 2001 From: John

Re: Doc fix of aggressive vacuum threshold for multixact members storage

2025-03-03 Thread John Naylor
change is actually to move to 64-bit offsets, as was proposed here and has some enthusiastic support: https://www.postgresql.org/message-id/CACG=ezawg7_nt-8ey4akv2w9lculthhknwcawmbgeetnjrj...@mail.gmail.com I've attached v5 which is just v4 with only the doc changes and a draft commit message. I

Re: vacuumdb changes for stats import/export

2025-03-03 Thread John Naylor
On Mon, Mar 3, 2025 at 11:21 PM Nathan Bossart wrote: > > On Mon, Mar 03, 2025 at 05:58:43PM +0700, John Naylor wrote: > True. One small thing we could do is to require "found_objs" (the double > pointer) to always be non-NULL, but that just compels some callers to >

Re: Improve CRC32C performance on SSE4.2

2025-03-03 Thread John Naylor
d a runtime check. I briefly tried the attribute approach and it doesn't work for me. If you can get it to work, go ahead and share how that's done, but keep in mind that we're not gcc/clang only -- it also has to work for MSVC's "__forceinline"... -- John Naylor Amazon Web Services

Re: Improve CRC32C performance on SSE4.2

2025-03-03 Thread John Naylor
byte of input, and other overheads, so I think it would still be very slow. > Overall, I wish we could avoid splitting things into separate files and > adding more header file gymnastics, but maybe there isn't much better we > can do without overhauling the CPU feature detection code. Y

Re: vacuumdb changes for stats import/export

2025-03-03 Thread John Naylor
On Sat, Mar 1, 2025 at 3:42 AM Nathan Bossart wrote: > > On Thu, Feb 27, 2025 at 04:36:04PM +0700, John Naylor wrote: > > I had to read it several times before I noticed the difference between > > "* found_objs" and "*found_objs". Maybe some extra spaci

Re: SIMD optimization for list_sort

2025-03-03 Thread John Naylor
rchitecture first, before being asked to look at code. Tuple sort has special challenges, so when you're ready to start a new thread for that, I'll be curious about your findings. -- John Naylor Amazon Web Services

Re: SIMD optimization for list_sort

2025-02-28 Thread John Naylor
having repeated values in the range 1-10 we still > see a gain of around 20% in throughput. > We will collect more data for low cardinality inputs and with AVX2 too. Thanks for the news, those are encouraging results. -- John Naylor Amazon Web Services

Re: Improve CRC32C performance on SSE4.2

2025-02-28 Thread John Naylor
nly for runtime-check builds 0004: the PCLMUL path for SSE4.2 builds. This uses a function pointer for long-ish input and the same above inlined path for short input (whether constant or not). So it gets the best of both worlds. There is also a separate issue: On Tue, Feb 25, 2025 at 6:05 PM Joh

Re: vacuumdb changes for stats import/export

2025-02-27 Thread John Naylor
* the list of tables to process. When 'objects' is NULL, all tables in the I had to read it several times before I noticed the difference between "* found_objs" and "*found_objs". Maybe some extra spacing and breaks would help, or other reorganization. -- John Naylor Amazon Web Services

Re: Improve CRC32C performance on SSE4.2

2025-02-26 Thread John Naylor
lization steps. I tried to imply that in my last review, but maybe I should have been more explicit. I think the least painful step is to take the x86 initialization from v10, which is looking great, but - keep separate initialization files - don't whack around the runtime representation, at least not in the same patch -- John Naylor Amazon Web Services

Re: Doc fix of aggressive vacuum threshold for multixact members storage

2025-02-25 Thread John Naylor
7;s unlikely the actual > computation will change. I'm on the fence about putting a hint in the C file, but the computation has changed in the past, see commit b4d4ce1d50bbdf , so it's a reasonable idea. -- John Naylor Amazon Web Services

Re: Improve CRC32C performance on SSE4.2

2025-02-25 Thread John Naylor
cpucap_x86(); +#else // ARM: +pg_cpucap_arm(); +#endif +} If we're going to have a single file for the init step, we don't need this -- we'd just have a different definition of pg_cpucap_initialize() in each part, with a default that only adds the "init" slot: #if de

Re: Improve CRC32C performance on SSE4.2

2025-02-25 Thread John Naylor
On Tue, Feb 18, 2025 at 1:40 PM John Naylor wrote: > > On Tue, Feb 18, 2025 at 12:41 AM Nathan Bossart > wrote: > > While this needn't block this patch set, I do find the dispatch code to be > > pretty complicated. Maybe we can improve that in the future by using &

Re: Improve CRC32C performance on SSE4.2

2025-02-25 Thread John Naylor
dy been initialized) by using a static variable or some > other approach. Does this make sense? Correct me if I'm misunderstanding, but this sounds like in every frontend program we'd need to know what the first call was, which seems less maintainable than just initializing at the start of every frontend program. -- John Naylor Amazon Web Services

Re: Change GUC hashtable to use simplehash?

2025-02-24 Thread John Naylor
On Sat, Feb 15, 2025 at 12:28 PM John Naylor wrote: > > On Fri, Feb 14, 2025 at 6:40 PM Anton A. Melnikov > wrote: > > Yes, of course. I tested this patch on the current master at 9e17ac997 > > in the same way and found no valgrind errors. > > Thanks, I'll pus

Re: Parallel heap vacuum

2025-02-23 Thread John Naylor
unlock the store when it has enough blocks. Sometimes my brainstorms are unworkable for some reason I failed to think about, but this way seems 95% simpler -- we would only need to teach the existing iteration machinery to take a "start key". -- John Naylor Amazon Web Services

Re: Improve CRC32C performance on SSE4.2

2025-02-20 Thread John Naylor
apability if (unlikely(pg_cpucap & PGCPUCAP_CRC32C_INIT == 0)) { pg_cpucap_crc32c(); // also sets PGCPUCAP_CRC32C_INIT if (pg_cpucap & PGCPUCAP_CRC32C) return COMP_CRC32C_HW(crc, data, len); } #endif // ...fallthrough to SB8 -- John Naylor Amazon Web Services

Re: Improve CRC32C performance on SSE4.2

2025-02-18 Thread John Naylor
.c and build them > > unconditionally for each platform > > +1. Sounds perfect. We should also move the avx512 runtime detection of > popcount here. [1] https://stackoverflow.com/questions/1113409/attribute-constructor-equivalent-in-vc [2] https://www.postgresql.org/message-id/ca+hukgks64zjezv9y9mpcb-j0i+flgiv3fadwsh_3scavdr...@mail.gmail.com -- John Naylor Amazon Web Services

Re: SIMD optimization for list_sort

2025-02-18 Thread John Naylor
e support. It seems AVX-512 is not supported well on client side, where most developers work. And availability of any flavor is not guaranteed on server either. Something to keep in mind. -- John Naylor Amazon Web Services

Re: Sort functions with specialized comparators

2025-02-17 Thread John Naylor
On Wed, Feb 5, 2025 at 8:34 PM John Naylor wrote: > > > On Tue, Jan 14, 2025 at 4:22 PM Andrey Borodin wrote: > > > > Looks good to me. > > Nice stats for some cleaning up 34 insertions(+), 48 deletions(-). > > Great, I've attached v11 with a draft commit

Re: Improve CRC32C performance on SSE4.2

2025-02-17 Thread John Naylor
On Tue, Feb 18, 2025 at 12:41 AM Nathan Bossart wrote: > > On Mon, Feb 17, 2025 at 05:58:01PM +0700, John Naylor wrote: > > I tried using branching for the runtime check, and this looks like the > > way to go: > > - Existing -msse4.2 builders will still call directly, but

Re: Improve CRC32C performance on SSE4.2

2025-02-17 Thread John Naylor
untime checks, as was proposed for popcount in the AVX-512 CRC thread, but with branching my model was Andres' sketch here: https://www.postgresql.org/message-id/20240731023918.ixsfbeuub6e76one%40awork3.anarazel.de -- John Naylor Amazon Web Services From f327b7fcb588100d2dc7483369cfd36380210715 Mon Sep 1

Re: Change GUC hashtable to use simplehash?

2025-02-14 Thread John Naylor
In the time after your initial report, I misremembered what the bad commit was. Sorry about that! -- John Naylor Amazon Web Services

Re: Change GUC hashtable to use simplehash?

2025-02-13 Thread John Naylor
Hi Anton, could you please test if the attached passes for you? This seems the simplest way. -- John Naylor Amazon Web Services diff --git a/src/include/common/hashfn_unstable.h b/src/include/common/hashfn_unstable.h index e07c0226c1..bb09f87abe 100644 --- a/src/include/common/hashfn_unstable.h

Re: Parallel heap vacuum

2025-02-13 Thread John Naylor
. I'd like to avoid making newly introduced > codes more complex by adding yet another new code on top of that. Would it be simpler to make only phase III parallel? In other words, how much of the infrastructure and complexity needed for parallel phase I is also needed for phase III? -- John Naylor Amazon Web Services

Re: Improve CRC32C performance on SSE4.2

2025-02-13 Thread John Naylor
y average = 4.786 ms latency average = 4.942 ms 96 latency average = 5.392 ms latency average = 5.376 ms latency average = 5.367 ms 112 latency average = 5.730 ms latency average = 5.859 ms latency average = 5.734 ms -- John Naylor Amazon Web Services From acb63cddd8c8220db97ae0b012bf4f2fb5174

Re: Change GUC hashtable to use simplehash?

2025-02-12 Thread John Naylor
econd [2] at ecb8226a after reverting in the 235328ee. Three weeks ago, you said "Agreed that reverting seems as a preferable way, and here's why." I assumed that meant you tested it, so my mistake. I'll take a look. -- John Naylor Amazon Web Services

Re: Change GUC hashtable to use simplehash?

2025-02-12 Thread John Naylor
On Thu, Feb 13, 2025 at 3:42 AM Anton A. Melnikov wrote: > > Hi! > > On 29.01.2025 10:02, John Naylor wrote: > > This is done -- thanks for the report, and for testing. > > It's good that this is done! But i still see the problem. Hi, my understanding was you prev

Re: Improve CRC32C performance on SSE4.2

2025-02-12 Thread John Naylor
SELECT crc32c(repeat('A', 128)::bytea); Maybe it's sufficient to have 127, 128, 129 for lengths, and maybe a couple more. -- John Naylor Amazon Web Services From 57952d1f89f0c3a4a2d28399344e9335f8bee72b Mon Sep 17 00:00:00 2001 From: John Naylor Date: Wed, 12 Feb 2025 15:27:16

Re: Fix punctuation errors in PostgreSQL documentation

2025-02-11 Thread John Naylor
On Mon, Feb 10, 2025 at 6:34 PM John Naylor wrote: > Thanks for the patch! I will push this after the upcoming minor releases. This is done. -- John Naylor Amazon Web Services

Re: Improve CRC32C performance on SSE4.2

2025-02-10 Thread John Naylor
above. If you care about > keeping the performance on Nehalem, then I am happy to update the choose > function to pick the right pointer accordingly. Let me know which one you > would prefer. Okay, Nehalem is 17 years old, and the additional cpuid check would still work on hardware 14-15 years old, so I think it's fine to bump the requirement for runtime hardware support. -- John Naylor Amazon Web Services

Re: Fix punctuation errors in PostgreSQL documentation

2025-02-10 Thread John Naylor
r the patch! I will push this after the upcoming minor releases. -- John Naylor Amazon Web Services

Re: branch-free tuplesort partitioning

2025-02-10 Thread John Naylor
with only two distinct steps: https://www.postgresql.org/message-id/PH7P220MB1533DA211DF219996760CBB7D9EB2%40PH7P220MB1533.NAMP220.PROD.OUTLOOK.COM -- John Naylor Amazon Web Services branchless-lomuto-20250210.ods Description: application/vnd.oasis.opendocument.spreadsheet

Re: Improve CRC32C performance on SSE4.2

2025-02-09 Thread John Naylor
t sure if anyone would notice, and I think we could fix it for people using a packaged binary by having a fallback wrapper function that just calls the SSE 4.2 "tail", as 0002 calls it. -- John Naylor Amazon Web Services test-crc.sh Description: application/shellscript From 5b329ccf89986ab5e6dd

Sort functions with specialized comparators

2025-02-05 Thread John Naylor
a hand-written declaration in the header. I plan to commit this next week unless there are objections. -- John Naylor Amazon Web Services -- John Naylor Amazon Web Services From 868506ebef1cd20aee053e4767aac22cd40b1da0 Mon Sep 17 00:00:00 2001 From: "Andrey M. Borodin" Date: Sat, 18

Re: Comment cleanup - it's vs its

2025-01-28 Thread John Naylor
period. Some comments use the two-space style anyway. Pushed, thanks for the patch! -- John Naylor Amazon Web Services

Re: Comment cleanup - it's vs its

2025-01-28 Thread John Naylor
is, too: -# DROP DATABASE should drops it's slots, including active slots. +# DROP DATABASE should drops its slots, including active slots. "should drop" -- John Naylor Amazon Web Services

Re: Comment cleanup - it's vs its

2025-01-28 Thread John Naylor
his one: - * try read non-locale sign, it's happen only if format is not exact + * try read non-locale sign, it happens only if format is not exact ...sounds better to me with "which happens". -- John Naylor Amazon Web Services

Re: Change GUC hashtable to use simplehash?

2025-01-28 Thread John Naylor
On Thu, Jan 23, 2025 at 8:52 AM Anton A. Melnikov wrote: > > Hi! > > On 22.01.2025 11:37, John Naylor wrote: > > On Fri, Jan 17, 2025 at 4:50 PM John Naylor wrote: > >> > >> It would be a lot more readable to revert the offending commit > >> inst

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2025-01-28 Thread John Naylor
he title of the paper is "Fast CRC Computation for iSCSI Polynomial Using CRC32 Instruction", so unsurprisingly it does improve the SSE42 version. With a few dozen lines of code, I can get ~3x speedup on page-sized inputs. At the very least we want to use this technique on Arm [3], and the

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2025-01-22 Thread John Naylor
alds/linux/blob/master/arch/x86/crypto/crc32c-pcl-intel-asm_64.S ...so I'm unclear if these patents are applicable to software implementations. They also seem to be expired, but I am not a lawyer. Could you look into this please? Even if we do end up with AVX-512, this would be a good fallback. -- John Naylor Amazon Web Services

Re: Change GUC hashtable to use simplehash?

2025-01-22 Thread John Naylor
On Fri, Jan 17, 2025 at 4:50 PM John Naylor wrote: > > It would be a lot more readable to revert the offending commit > instead, since its predecessor had a much simpler bytewise loop. This will require a backpatch to v17. I'll take care of that soon. -- John Naylor Amazon Web Services

Re: Change GUC hashtable to use simplehash?

2025-01-17 Thread John Naylor
#x27;d be inclined to just remove the pg_rightmost_one_pos64 call > > in favor of the other coding you suggest. > > Here is a patch like that. It would be a lot more readable to revert the offending commit instead, since its predecessor had a much simpler bytewise loop. -- John Naylor Amazon Web Services

Re: [PATCH] Hex-coding optimizations using SVE on ARM.

2025-01-15 Thread John Naylor
nd on my machine bytewise load/stores are somewhere in the middle: master1158.969 ms v3 776.791 ms variant 4A 775.777 ms variant 4B 969.945 ms https://godbolt.org/z/ajToordKq -- John Naylor Amazon Web Services

Re: [PATCH] Hex-coding optimizations using SVE on ARM.

2025-01-14 Thread John Naylor
On Tue, Jan 14, 2025 at 11:57 PM Nathan Bossart wrote: > > On Tue, Jan 14, 2025 at 12:59:04AM -0500, Tom Lane wrote: > > John Naylor writes: > >> We can do about as well simply by changing the nibble lookup to a byte > >> lookup, which works on every compiler and ar

Re: Sort functions with specialized comparators

2025-01-14 Thread John Naylor
need to test. That's not as clear-cut as I thought. To avoid regressions, I've gone back to an earlier idea to pass the direction to the comparator, but this time keep it simple by using the same comparator for sort and unique, similar to v9. -- John Naylor Amazon Web Services From cf2f605

Re: [PATCH] Hex-coding optimizations using SVE on ARM.

2025-01-13 Thread John Naylor
1158.700 ms v2: Time: 777.443 ms If we need to do much better than this, it seems better to send the data to the client as binary, if possible. -- John Naylor Amazon Web Services diff --git a/src/backend/utils/adt/encode.c b/src/backend/utils/adt/encode.c index 4a6fcb56cd..8b059bc834 100644 ---

Sort functions with specialized comparators

2025-01-07 Thread John Naylor
ting to note that not terribly long ago isort was an insertion sort, hence the name: commit 8d1f239003d0245dda636dfa6cf0add13bee69d6 Author: Tom Lane Date: Sun Mar 15 23:22:03 2015 -0400 Replace insertion sort in contrib/intarray with qsort(). -- John Naylor Amazon Web Services -- John Naylor Amazon Web Services

Re: Sort functions with specialized comparators

2025-01-06 Thread John Naylor
On Tue, Jan 7, 2025 at 12:47 AM Nathan Bossart wrote: > > On Mon, Jan 06, 2025 at 05:54:29PM +0700, John Naylor wrote: > > Those functions from common/int.h are probably not good when inlined > > (see comment there). > > +1. In fact, I think this comment was adde

Re: Sort functions with specialized comparators

2025-01-06 Thread John Naylor
On Mon, Jan 6, 2025 at 10:51 PM Andrey M. Borodin wrote: > > > On 6 Jan 2025, at 15:54, John Naylor wrote: > > argument. Like some other patches in this series, this does have the > > side effect of removing the ability to skip quinique(), so that should > > be benc

Re: Sort functions with specialized comparators

2025-01-06 Thread John Naylor
On Sun, Jan 5, 2025 at 1:15 AM Andrey M. Borodin wrote: > > > On 4 Jan 2025, at 10:24, John Naylor wrote: > > > > v6-0001: > > > > +static int > > +unique_cmp(const void *a, const void *b) > > +{ > > + int32 aval = *((const int32 *) a); > >

Re: Fix crash when non-creator being an iteration on shared radix tree

2025-01-05 Thread John Naylor
to get working, but it didn't have the intended effect so I'll leave it alone. -- John Naylor Amazon Web Services

Re: Sort functions with specialized comparators

2025-01-03 Thread John Naylor
On Sat, Dec 21, 2024 at 12:16 AM Andrey M. Borodin wrote: > > > > > On 16 Dec 2024, at 14:02, John Naylor wrote: > > > > Sorry, I forgot this part earlier. Yes, let's have the private function. > > PFA v6. v6-0001: +static int +unique_cmp(const v

Re: Incorrect CHUNKHDRSZ in nodeAgg.c

2025-01-03 Thread John Naylor
gt; and two powers-of-two multiplied are always a power-of-two value. > Unfortunately, TupleHashEntryData is 24 bytes and I don't see any easy > way to shrink it to 16 bytes. FYI, there is a proposal for that at https://www.postgresql.org/message-id/817d244237878cebdff0bc363718feaf49a1ea7

Re: Fix crash when non-creator being an iteration on shared radix tree

2024-12-20 Thread John Naylor
On Sat, Dec 21, 2024 at 2:17 AM Masahiko Sawada wrote: > > On Fri, Dec 20, 2024 at 2:27 AM John Naylor wrote: > > v3-0001 allocates the iter data in the caller's context. It's a small > > patch, but still a core behavior change so proposed for master-only. I >

Re: Fix crash when non-creator being an iteration on shared radix tree

2024-12-20 Thread John Naylor
On Fri, Dec 20, 2024 at 4:12 AM Masahiko Sawada wrote: > > On Wed, Dec 18, 2024 at 10:32 PM John Naylor wrote: > > 2. The iter_context is separate because the creator's new context > > could be a bump context which doesn't support pfree. But above we > > a

Re: Change GUC hashtable to use simplehash?

2024-12-19 Thread John Naylor
, const char *str) { -#if SIZEOF_VOID_P >= 8 +#if SIZEOF_VOID_P >= 8 && !defined(USE_VALGRIND) Any objections? -- John Naylor Amazon Web Services

Re: Fix crash when non-creator being an iteration on shared radix tree

2024-12-18 Thread John Naylor
On Thu, Dec 19, 2024 at 1:00 AM Masahiko Sawada wrote: > > On Tue, Dec 17, 2024 at 11:12 PM John Naylor wrote: > > +1 in general, but I wonder if instead the iter_context should be > > created within RT_BEGIN_ITERATE -- I imagine that would have less > > duplication and

Re: Change GUC hashtable to use simplehash?

2024-12-18 Thread John Naylor
ent results on Arm vs x86. The offending code is not even my preferred way to handle the last word of the string (see f4ad0021af), so if the current way is still not valgrind-clean, I wonder if we should give up and add an exception, since we know any garbage bits are masked off. -- John Nay

Re: Shave a few cycles off our ilog10 implementation

2024-12-18 Thread John Naylor
e a strong reason to be either for or against this patch. Anyone else want to test? create table bi (a bigint, b bigint, c bigint, d bigint, e bigint, f bigint, g bigint, h bigint, i bigint, j bigint); insert into bi select i,i,i,i,i,i,i,i,i,i from generate_Series(1,10_000_000) i; vacuum freeze analyze bi; pgbench -n -T 180 -f bench.sql -- John Naylor Amazon Web Services

Re: Fix crash when non-creator being an iteration on shared radix tree

2024-12-17 Thread John Naylor
t > should be backpatched to v17. +1 in general, but I wonder if instead the iter_context should be created within RT_BEGIN_ITERATE -- I imagine that would have less duplication and would be as safe, but I haven't tried it. Is there some reason not to do that? -- John Naylor Amazon Web Services

Re: CRC32C Parallel Computation Optimization on ARM

2024-12-17 Thread John Naylor
sted expiration" On the other hand, looking at Linux kernel sources, it seems a patch using this technique was contributed by Intel over a decade ago: https://github.com/torvalds/linux/blob/master/arch/x86/crypto/crc32c-pcl-intel-asm_64.S So one more thing to ask our friends at Intel. -- John Naylor Amazon Web Services

Re: Sort functions with specialized comparators

2024-12-16 Thread John Naylor
On Mon, Dec 16, 2024 at 12:58 AM Andrey M. Borodin wrote: > So, let's do the function private for intarray and try to remove as much code > as possible? Sorry, I forgot this part earlier. Yes, let's have the private function. -- John Naylor Amazon Web Services

Re: Sort functions with specialized comparators

2024-12-16 Thread John Naylor
On Mon, Dec 16, 2024 at 12:58 AM Andrey M. Borodin wrote: > > > On 11 Dec 2024, at 11:39, John Naylor wrote: > > Also, I was hoping get an answer for how this would actually affect > > intarray use you've seen in the wild. If the answer is "I don't know

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-12-15 Thread John Naylor
On Sat, Dec 14, 2024 at 10:24 PM Andres Freund wrote: > > Hi, > > On 2024-12-14 12:08:57 +0700, John Naylor wrote: > > On Thu, Jun 13, 2024 at 2:37 AM Andres Freund wrote: > > > > > > It's hard to understand, but a nonetheless helpful page is > >

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-12-13 Thread John Naylor
way? Sure, for an even number bitflips beyond a small number, we're left with the luck ordinary collisions, and CRC is not particularly great, but for two messages of the same length, I'm also not sure it's all that bad, either -- John Naylor Amazon Web Services

Re: typo in a comment of restrictinfo.c

2024-12-13 Thread John Naylor
but 'constructing' ? Pushed, thanks! -- John Naylor Amazon Web Services

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-12-12 Thread John Naylor
#L215 -- John Naylor Amazon Web Services

Re: CRC32C Parallel Computation Optimization on ARM

2024-12-11 Thread John Naylor
On Wed, Dec 11, 2024 at 11:54 PM Nathan Bossart wrote: > > On Wed, Dec 11, 2024 at 02:08:58PM +0700, John Naylor wrote: > > and how light it was. With more hardware support, we can go much lower > > than 1024 bytes, but that can be left for future work. > > Nice. I

Re: CRC32C Parallel Computation Optimization on ARM

2024-12-10 Thread John Naylor
an handle anything up to 8400 bytes in a single pass. There are still some "taste" issues, but I like the overall shape here and how light it was. With more hardware support, we can go much lower than 1024 bytes, but that can be left for future work. -- John Naylor Amazon Web Services Fr

Re: Sort functions with specialized comparators

2024-12-10 Thread John Naylor
On Mon, Dec 9, 2024 at 8:02 PM Andrey M. Borodin wrote: > > > On 6 Dec 2024, at 08:49, John Naylor wrote: > > That's a good thing to raise right now -- intarray currently doesn't > > have one, and we haven't gotten complaints from people trying to sort &

Re: fix deprecation mention for age() and mxid_age()

2024-12-10 Thread John Naylor
to mark committed? -- John Naylor Amazon Web Services

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-12-09 Thread John Naylor
suming we have a clean way to make that portable? That would mean that the > > CRCs between major versions would be different, but I think we don't > > guarantee > > that anyway. > > Not sure about that. This is not my expertise and I might need a little time > to figu

  1   2   3   4   5   6   7   8   9   10   >