On Mon, Aug 08, 2022 at 01:46:48PM +0700, John Naylor wrote: > Okay, I think it's basically in good shape. Since it should be a bit > faster than a couple versions ago, would you be up for retesting with > the original test having 8 to 512 writers?
Sure thing. The results are similar. As before, the improvements are most visible when the arrays are large. writers head patch 8 672 680 16 639 664 32 701 689 64 705 703 128 628 653 256 576 627 512 530 584 768 450 536 1024 350 494 > And also add the const > markers we discussed upthread? Oops, sorry about that. This is done in v9. > Aside from that, I plan to commit this > week unless there is further bikeshedding. Great, thanks. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
>From 816c56bc8779a1e8ab85db8ca61ba8d3438957d7 Mon Sep 17 00:00:00 2001 From: Nathan Bossart <nathandboss...@gmail.com> Date: Wed, 3 Aug 2022 09:49:04 -0700 Subject: [PATCH v9 1/2] Introduce optimized routine for linear searches through an array of integers. If SSE2 is available, this function uses it to speed up the search. Otherwise, it uses a simple 'for' loop. This is a prerequisite for a follow-up commit that will use this function to optimize [sub]xip lookups in XidInMVCCSnapshot(), but it can be used anywhere that might benefit from such an optimization. It might be worthwhile to add an ARM-specific code path to this function in the future. Author: Nathan Bossart Reviewed by: Andres Freund, John Naylor Discussion: https://postgr.es/m/20220713170950.GA3116318%40nathanxps13 --- src/include/port/pg_lfind.h | 69 +++++++++++++++++++++++++++++++++++++ 1 file changed, 69 insertions(+) create mode 100644 src/include/port/pg_lfind.h diff --git a/src/include/port/pg_lfind.h b/src/include/port/pg_lfind.h new file mode 100644 index 0000000000..4a9484a16d --- /dev/null +++ b/src/include/port/pg_lfind.h @@ -0,0 +1,69 @@ +/*------------------------------------------------------------------------- + * + * pg_lfind.h + * Optimized linear search routines. + * + * Copyright (c) 2022, PostgreSQL Global Development Group + * + * IDENTIFICATION + * src/port/pg_lfind.h + * + *------------------------------------------------------------------------- + */ +#ifndef PG_LFIND_H +#define PG_LFIND_H + +#include "port/simd.h" + +/* + * pg_lfind32 + * + * Returns true if there is an element in 'base' that equals 'key'. Otherwise, + * returns false. + */ +static inline bool +pg_lfind32(uint32 key, uint32 *base, uint32 nelem) +{ + uint32 i = 0; + + /* If possible, use SSE2 intrinsics to speed up the search. */ +#ifdef USE_SSE2 + const __m128i keys = _mm_set1_epi32(key); /* load 4 copies of key */ + uint32 iterations = nelem & ~0xF; /* round down to multiple of 16 */ + + for (; i < iterations; i += 16) + { + /* load the next 16 values into __m128i variables */ + const __m128i vals1 = _mm_loadu_si128((__m128i *) &base[i]); + const __m128i vals2 = _mm_loadu_si128((__m128i *) &base[i + 4]); + const __m128i vals3 = _mm_loadu_si128((__m128i *) &base[i + 8]); + const __m128i vals4 = _mm_loadu_si128((__m128i *) &base[i + 12]); + + /* perform the comparisons */ + const __m128i result1 = _mm_cmpeq_epi32(keys, vals1); + const __m128i result2 = _mm_cmpeq_epi32(keys, vals2); + const __m128i result3 = _mm_cmpeq_epi32(keys, vals3); + const __m128i result4 = _mm_cmpeq_epi32(keys, vals4); + + /* shrink the results into a single variable */ + const __m128i tmp1 = _mm_or_si128(result1, result2); + const __m128i tmp2 = _mm_or_si128(result3, result4); + const __m128i result = _mm_or_si128(tmp1, tmp2); + + /* see if there was a match */ + if (_mm_movemask_epi8(result) != 0) + return true; + } +#endif + + /* Process the remaining elements the slow way. */ + for (; i < nelem; i++) + { + if (key == base[i]) + return true; + } + + return false; +} + +#endif /* PG_LFIND_H */ -- 2.25.1
>From 6942097a6406bca2c52851bbad40e5f679cc18ef Mon Sep 17 00:00:00 2001 From: Nathan Bossart <nathandboss...@gmail.com> Date: Wed, 3 Aug 2022 09:59:28 -0700 Subject: [PATCH v9 2/2] Optimize linear searches in XidInMVCCSnapshot(). This change makes use of the recently-introduced optimized linear search routine to speed up searches through the [sub]xip arrays when possible, which should improve performance significantly when the arrays are large. Author: Nathan Bossart Reviewed by: Andres Freund, John Naylor Discussion: https://postgr.es/m/20220713170950.GA3116318%40nathanxps13 --- src/backend/utils/time/snapmgr.c | 28 +++++++--------------------- 1 file changed, 7 insertions(+), 21 deletions(-) diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c index 5bc2a15160..9b504c9745 100644 --- a/src/backend/utils/time/snapmgr.c +++ b/src/backend/utils/time/snapmgr.c @@ -56,6 +56,7 @@ #include "datatype/timestamp.h" #include "lib/pairingheap.h" #include "miscadmin.h" +#include "port/pg_lfind.h" #include "storage/predicate.h" #include "storage/proc.h" #include "storage/procarray.h" @@ -2284,8 +2285,6 @@ RestoreTransactionSnapshot(Snapshot snapshot, void *source_pgproc) bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot) { - uint32 i; - /* * Make a quick range check to eliminate most XIDs without looking at the * xip arrays. Note that this is OK even if we convert a subxact XID to @@ -2317,13 +2316,8 @@ XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot) if (!snapshot->suboverflowed) { /* we have full data, so search subxip */ - int32 j; - - for (j = 0; j < snapshot->subxcnt; j++) - { - if (TransactionIdEquals(xid, snapshot->subxip[j])) - return true; - } + if (pg_lfind32(xid, snapshot->subxip, snapshot->subxcnt)) + return true; /* not there, fall through to search xip[] */ } @@ -2344,16 +2338,11 @@ XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot) return false; } - for (i = 0; i < snapshot->xcnt; i++) - { - if (TransactionIdEquals(xid, snapshot->xip[i])) - return true; - } + if (pg_lfind32(xid, snapshot->xip, snapshot->xcnt)) + return true; } else { - int32 j; - /* * In recovery we store all xids in the subxact array because it is by * far the bigger array, and we mostly don't know which xids are @@ -2383,11 +2372,8 @@ XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot) * indeterminate xid. We don't know whether it's top level or subxact * but it doesn't matter. If it's present, the xid is visible. */ - for (j = 0; j < snapshot->subxcnt; j++) - { - if (TransactionIdEquals(xid, snapshot->subxip[j])) - return true; - } + if (pg_lfind32(xid, snapshot->subxip, snapshot->subxcnt)) + return true; } return false; -- 2.25.1