Hi Andres, Thanks for taking a look.
On Fri, Jul 15, 2022 at 01:08:57PM -0700, Andres Freund wrote: > Hm. Are there any contexts where we'd not want the potential for failing due > to OOM? I'm not sure about this one. > I wonder if we additionally / alternatively could use a faster method of > searching the array linearly, e.g. using SIMD. I looked into using SIMD. The patch is attached, but it is only intended for benchmarking purposes and isn't anywhere close to being worth serious review. There may be a simpler/better way to implement the linear search, but this seemed to work well. Overall, SIMD provided a decent improvement. I had to increase the number of writers quite a bit in order to demonstrate where the hash tables began winning. Here are the numbers: writers head simd hash 256 663 632 694 512 530 618 637 768 489 544 573 1024 364 508 562 2048 185 306 485 4096 146 197 441 While it is unsurprising that the hash tables perform the best, there are a couple of advantages to SIMD that might make that approach worth considering. For one, there's really no overhead (i.e., you don't need to sort the array or build a hash table), so we can avoid picking an arbitrary threshold and just have one code path. Also, a SIMD implementation for a linear search through an array of integers could likely be easily reused elsewhere. > Another thing that might be worth looking into is to sort the xip/subxip > arrays into a binary-search optimized layout. That'd make the binary search > faster, wouldn't require additional memory (a boolean indicating whether > sorted somewhere, I guess), and would easily persist across copies of the > snapshot. I spent some time looking into this, but I haven't attempted to implement it. IIUC the most difficult part of this is sorting the array in place to the special layout. >> These hash tables are regarded as ephemeral; they only live in >> process-local memory and are never rewritten, copied, or >> serialized. > > What does rewriting refer to here? I mean that a hash table created for one snapshot will not be cleared and reused for another. > Not convinced that's the right idea in case of copying. I think we often end > up copying snapshots frequently, and building & allocating the hashed xids > separately every time seems not great. Right. My concern with reusing the hash tables is that we'd need to allocate much more space that would go largely unused in many cases. >> + snapshot->xiph = NULL; >> + snapshot->subxiph = NULL; > > Do we need separate hashes for these? ISTM that if we're overflowed then we > don't need ->subxip[h], and if not, then the action for an xid being in ->xip > and ->subxiph is the same? Do you mean that we can combine these into one hash table? That might work. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c index 5bc2a15160..25d1a3564c 100644 --- a/src/backend/utils/time/snapmgr.c +++ b/src/backend/utils/time/snapmgr.c @@ -45,6 +45,7 @@ */ #include "postgres.h" +#include <nmmintrin.h> #include <sys/stat.h> #include <unistd.h> @@ -2271,6 +2272,40 @@ RestoreTransactionSnapshot(Snapshot snapshot, void *source_pgproc) SetTransactionSnapshot(snapshot, NULL, InvalidPid, source_pgproc); } +static inline bool +XidInXip(TransactionId xid, TransactionId *xip, uint32 len) +{ + __m128i xids = _mm_set1_epi32(xid); + uint32 its = len & ~15; /* round down to nearest multiple of 16 */ + uint32 i; + + for (i = 0; i < its; i += 16) + { + __m128i xips1 = _mm_loadu_si128((__m128i *) &xip[i]); + __m128i xips2 = _mm_loadu_si128((__m128i *) &xip[i + 4]); + __m128i xips3 = _mm_loadu_si128((__m128i *) &xip[i + 8]); + __m128i xips4 = _mm_loadu_si128((__m128i *) &xip[i + 12]); + __m128i result1 = _mm_cmpeq_epi32(xids, xips1); + __m128i result2 = _mm_cmpeq_epi32(xids, xips2); + __m128i result3 = _mm_cmpeq_epi32(xids, xips3); + __m128i result4 = _mm_cmpeq_epi32(xids, xips4); + __m128i tmp1 = _mm_packs_epi32(result1, result2); + __m128i tmp2 = _mm_packs_epi32(result3, result4); + __m128i result = _mm_packs_epi16(tmp1, tmp2); + + if (_mm_movemask_epi8(result) != 0) + return true; + } + + while (i < len) + { + if (TransactionIdEquals(xid, xip[i++])) + return true; + } + + return false; +} + /* * XidInMVCCSnapshot * Is the given XID still-in-progress according to the snapshot? @@ -2284,8 +2319,6 @@ RestoreTransactionSnapshot(Snapshot snapshot, void *source_pgproc) bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot) { - uint32 i; - /* * Make a quick range check to eliminate most XIDs without looking at the * xip arrays. Note that this is OK even if we convert a subxact XID to @@ -2317,13 +2350,8 @@ XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot) if (!snapshot->suboverflowed) { /* we have full data, so search subxip */ - int32 j; - - for (j = 0; j < snapshot->subxcnt; j++) - { - if (TransactionIdEquals(xid, snapshot->subxip[j])) - return true; - } + if (XidInXip(xid, snapshot->subxip, snapshot->subxcnt)) + return true; /* not there, fall through to search xip[] */ } @@ -2344,16 +2372,11 @@ XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot) return false; } - for (i = 0; i < snapshot->xcnt; i++) - { - if (TransactionIdEquals(xid, snapshot->xip[i])) - return true; - } + if (XidInXip(xid, snapshot->xip, snapshot->xcnt)) + return true; } else { - int32 j; - /* * In recovery we store all xids in the subxact array because it is by * far the bigger array, and we mostly don't know which xids are @@ -2383,11 +2406,8 @@ XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot) * indeterminate xid. We don't know whether it's top level or subxact * but it doesn't matter. If it's present, the xid is visible. */ - for (j = 0; j < snapshot->subxcnt; j++) - { - if (TransactionIdEquals(xid, snapshot->subxip[j])) - return true; - } + if (XidInXip(xid, snapshot->subxip, snapshot->subxcnt)) + return true; } return false;