Hi hackers,
I wrote a patch to resolve the subtransactions concurrency performance
problems when suboverflowed.

When we use more than PGPROC_MAX_CACHED_SUBXIDS(64) subtransactions per
transaction concurrency, it will lead to subtransactions performance
problems. 
All backend will be stuck at acquiring lock SubtransSLRULock.

The reproduce steps in PG master branch:

1, init a cluster, append postgresql.conf as below: 

max_connections = '2500'
max_files_per_process = '2000'
max_locks_per_transaction = '64'
max_parallel_maintenance_workers = '8'
max_parallel_workers = '60'
max_parallel_workers_per_gather = '6'
max_prepared_transactions = '15000'
max_replication_slots = '10'
max_wal_senders = '64'
max_worker_processes = '250'
shared_buffers = 8GB

2, create table and insert some records as below:

CREATE UNLOGGED TABLE contend (
    id integer,
    val integer NOT NULL
)
WITH (fillfactor='50');
 
INSERT INTO contend (id, val)
SELECT i, 0
FROM generate_series(1, 10000) AS i;
 
VACUUM (ANALYZE) contend;

3, The script subtrans_128.sql in attachment. use pgbench with
subtrans_128.sql as below.
pgbench  -d postgres -p 33800 -n -r -f subtrans_128.sql  -c 500 -j 500 -T
3600


4, After for a while, we can get the stuck result. We can query
pg_stat_activity. All backends wait event is SubtransSLRULock.
   We can use pert top and try find the root cause. The result of perf top
as below:
66.20%  postgres            [.] pg_atomic_compare_exchange_u32_impl
  29.30%  postgres            [.] pg_atomic_fetch_sub_u32_impl
   1.46%  postgres            [.] pg_atomic_read_u32
   1.34%  postgres            [.] TransactionIdIsCurrentTransactionId
   0.75%  postgres            [.] SimpleLruReadPage_ReadOnly
   0.14%  postgres            [.] LWLockAttemptLock
   0.14%  postgres            [.] LWLockAcquire
   0.12%  postgres            [.] pg_atomic_compare_exchange_u32
   0.09%  postgres            [.] HeapTupleSatisfiesMVCC
   0.06%  postgres            [.] heapgetpage
   0.03%  postgres            [.] sentinel_ok
   0.03%  postgres            [.] XidInMVCCSnapshot
   0.03%  postgres            [.] slot_deform_heap_tuple
   0.03%  postgres            [.] ExecInterpExpr
   0.02%  postgres            [.] AllocSetCheck
   0.02%  postgres            [.] HeapTupleSatisfiesVisibility
   0.02%  postgres            [.] LWLockRelease
   0.02%  postgres            [.] TransactionIdPrecedes
   0.02%  postgres            [.] SubTransGetParent
   0.01%  postgres            [.] heapgettup_pagemode
   0.01%  postgres            [.] CheckForSerializableConflictOutNeeded


After view the subtrans codes, it is easy to find that the global LWLock
SubtransSLRULock is the bottleneck of subtrans concurrently.

When a bakcend session assign more than PGPROC_MAX_CACHED_SUBXIDS(64)
subtransactions, we will get a snapshot with suboverflowed.
A suboverflowed snapshot does not contain all data required to determine
visibility, so PostgreSQL will occasionally have to resort to pg_subtrans. 
These pages are cached in shared buffers, but you can see the overhead of
looking them up in the high rank of SimpleLruReadPage_ReadOnly in the perf
output.

To resolve this performance problem, we think about a solution which cache
SubtransSLRU to local cache. 
First we can query parent transaction id from SubtransSLRU, and copy the
SLRU page to local cache page.
After that if we need query parent transaction id again, we can query it
from local cache directly.
It will reduce the number of acquire and release LWLock SubtransSLRULock
observably.

>From all snapshots, we can get the latest xmin. All transaction id which
precedes this xmin, it muse has been commited/abortd. 
Their parent/top transaction has been written subtrans SLRU. Then we can
cache the subtrans SLRU and copy it to local cache.

Use the same produce steps above, with our patch we cannot get the stuck
result.
Note that append our GUC parameter in postgresql.conf. This optimize is off
in default.
local_cache_subtrans_pages=128 

The patch is base on PG master branch
0d906b2c0b1f0d625ff63d9ace906556b1c66a68


Our project in  https://github.com/ADBSQL/AntDB, Welcome to follow us,
AntDB, AsiaInfo's PG-based distributed database product

Thanks
Pengcheng

Attachment: subtrans_128.sql
Description: Binary data

diff --git a/src/backend/access/rmgrdesc/standbydesc.c 
b/src/backend/access/rmgrdesc/standbydesc.c
index 01ee7ac6d2..9b047c863a 100644
--- a/src/backend/access/rmgrdesc/standbydesc.c
+++ b/src/backend/access/rmgrdesc/standbydesc.c
@@ -129,6 +129,8 @@ standby_desc_invalidations(StringInfo buf,
                        appendStringInfo(buf, " relmap db %u", msg->rm.dbId);
                else if (msg->id == SHAREDINVALSNAPSHOT_ID)
                        appendStringInfo(buf, " snapshot %u", msg->sn.relId);
+               else if (msg->id == SUBTRANS_INVALID_PAGE_ID)
+                       appendStringInfo(buf, " subtrans page %u", 
msg->spp.pageno);
                else
                        appendStringInfo(buf, " unrecognized id %d", msg->id);
        }
diff --git a/src/backend/access/transam/subtrans.c 
b/src/backend/access/transam/subtrans.c
index 6a8e521f89..339425f5b7 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -33,7 +33,9 @@
 #include "access/transam.h"
 #include "pg_trace.h"
 #include "utils/snapmgr.h"
-
+#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/inval.h"
 
 /*
  * Defines for SubTrans page sizes.  A page is the same BLCKSZ as is used
@@ -53,7 +55,23 @@
 
 #define TransactionIdToPage(xid) ((xid) / (TransactionId) 
SUBTRANS_XACTS_PER_PAGE)
 #define TransactionIdToEntry(xid) ((xid) % (TransactionId) 
SUBTRANS_XACTS_PER_PAGE)
+#define SubtransMaxUsageCount  25
+#define SubtransSubUsageCount  5
 
+typedef struct LocalBufferData
+{
+       int             page_id;                /* for debug and check */
+       bool    in_htab;                /* in SubtransLocalBuffHtab? */
+       uint8   usage_count;    /* for remove */
+       uint16  valid_offset;   /* how many entry is valid */
+       TransactionId   xids[SUBTRANS_XACTS_PER_PAGE];
+} LocalBufferData,*LocalBuffer;
+
+typedef struct LocalBufferHash
+{
+       int                     page_id;
+       LocalBuffer     lbuffer;
+} LocalBufferHash;
 
 /*
  * Link to shared-memory data structures for SUBTRANS control
@@ -62,10 +80,392 @@ static SlruCtlData SubTransCtlData;
 
 #define SubTransCtl  (&SubTransCtlData)
 
+static HTAB *SubtransLocalBuffHtab = NULL;
+static LocalBuffer subtrans_local_buffers = NULL;
+static LocalBuffer subtrans_local_buf_end;
+static LocalBuffer subtrans_local_buf_next;
+int local_cache_subtrans_page_num = 0;
+int slru_subtrans_page_num = 32;
 
 static int     ZeroSUBTRANSPage(int pageno);
 static bool SubTransPagePrecedes(int page1, int page2);
+static LocalBuffer SubtransAllocLocalBuffer(void);
+
+#define HASH_PAGE_NO(no_)      (no_)
+static uint32
+hash_page_no(const void *key, Size keysize)
+{
+       Assert(keysize == sizeof(uint32));
+       return HASH_PAGE_NO(*(uint32*)key);
+}
+
+/* create subtrans local htab */
+static HTAB*
+CreateSubtransLocalHtab(long nelem, LocalBuffer *header)
+{
+       HASHCTL         hctl;
+       HTAB       *htab;
+       LocalBuffer     lbuffers;
+       Assert(nelem > 0);
+
+       memset(&hctl, 0, sizeof(hctl));
+       hctl.keysize = sizeof(((LocalBufferHash*)0)->page_id);
+       hctl.entrysize = sizeof(LocalBufferHash);
+       hctl.hcxt = TopMemoryContext;
+       hctl.hash = hash_page_no;
+
+       htab = hash_create("hash subtrans local buffer",
+                                          nelem,
+                                          &hctl,
+                                          HASH_ELEM | HASH_CONTEXT | 
HASH_FUNCTION);
+       lbuffers = MemoryContextAllocExtended(TopMemoryContext,
+                                                                               
  sizeof(LocalBufferData) * nelem,
+                                                                               
  MCXT_ALLOC_HUGE|MCXT_ALLOC_NO_OOM|MCXT_ALLOC_ZERO);
+       if (lbuffers == NULL)
+       {
+               hash_destroy(htab);
+               ereport(ERROR,
+                               (errcode(ERRCODE_OUT_OF_MEMORY),
+                                errmsg("out of memory")));
+       }
+
+       *header = lbuffers;
+       return htab;
+}
+
+static int
+compare_lbuffer_index(const void *a, const void *b)
+{
+       LocalBuffer l = &subtrans_local_buffers[*(uint32*)a];
+       LocalBuffer r = &subtrans_local_buffers[*(uint32*)b];
+
+       StaticAssertStmt(sizeof(l->usage_count) == 1 && 
offsetof(LocalBufferData, usage_count) == offsetof(LocalBufferData, in_htab) + 
1,
+                                        "Subtrans LocalBufferData stmt error");
+       /* compare the field in_htab and usage_count in LocalBuffer */
+       return memcmp(&r->in_htab, &l->in_htab, 2);
+}
+
+/* handle reset GUC local_cache_subtrans_page_num */
+void
+SubtransResetGucCacheNum(int newval, void *extra)
+{
+       HTAB              *volatile htabNew;
+       LocalBufferData *volatile lbuffers_new;
+       LocalBuffer                             lbuffer;
+       HASH_SEQ_STATUS                 seq_status;
+       LocalBufferHash            *old_local_hash;
+       LocalBufferHash            *new_local_hash;
+       bool                                    found;
+
+       if (local_cache_subtrans_page_num == newval ||
+               SubtransLocalBuffHtab == NULL)
+               return;
+
+       if (local_cache_subtrans_page_num == 0 || newval == 0)
+       {
+               if (SubtransLocalBuffHtab != NULL)
+               {
+                       hash_destroy(SubtransLocalBuffHtab);
+                       SubtransLocalBuffHtab = NULL;
+                       pfree(subtrans_local_buffers);
+                       subtrans_local_buffers = NULL;
+                       subtrans_local_buf_end = NULL;
+                       subtrans_local_buf_next = NULL;
+               }
+               return;
+       }
+
+       htabNew = CreateSubtransLocalHtab(newval, (LocalBuffer*)&lbuffers_new);
+       PG_TRY();
+       {
+               /* when new value more than old value. */
+               if (newval > local_cache_subtrans_page_num)
+               {
+                       /* traverse the old htab, and copy it to new htab */
+                       hash_seq_init(&seq_status, SubtransLocalBuffHtab);
+                       while ((old_local_hash = hash_seq_search(&seq_status)) 
!= NULL)
+                       {
+                               Assert(old_local_hash->lbuffer->page_id == 
old_local_hash->page_id);
+                               new_local_hash = 
hash_search_with_hash_value(htabNew,
+                                                                               
                                         &old_local_hash->page_id,
+                                                                               
                                         HASH_PAGE_NO(old_local_hash->page_id),
+                                                                               
                                         HASH_ENTER,
+                                                                               
                                         &found);
+                               Assert(found == false);
+                               new_local_hash->lbuffer = 
&lbuffers_new[old_local_hash->lbuffer - subtrans_local_buffers];
+                       }
+                       memcpy(lbuffers_new, subtrans_local_buffers, 
sizeof(lbuffers_new[0]) * local_cache_subtrans_page_num);
+               }else
+               {
+                       uint32  i,index;
+                       uint32 *indexes = palloc(sizeof(uint32) * 
local_cache_subtrans_page_num);
+                       for (i=0;i<local_cache_subtrans_page_num;++i)
+                               indexes[i] = i;
+                       /* qsort the local buffer index descending order, 
according in_htab and usage_count in local buffer, */
+                       qsort(indexes, local_cache_subtrans_page_num, 
sizeof(indexes[0]), compare_lbuffer_index);
+
+                       /* copy the older local buffer to the new local buffer. 
*/
+                       for (i=0;i<newval;++i)
+                       {
+                               index = indexes[i];
+                               memcpy(&lbuffers_new[i], 
&subtrans_local_buffers[index], sizeof(lbuffers_new[0]));
+                       }
+
+                       /* add new local buffer to new htab. */
+                       for (i=0;i<newval;++i)
+                       {
+                               lbuffer = &lbuffers_new[i];
+                               if (lbuffer->in_htab == false)
+                                       break;
+                               new_local_hash = 
hash_search_with_hash_value(htabNew,
+                                                                               
                                         &lbuffer->page_id,
+                                                                               
                                         HASH_PAGE_NO(lbuffer->page_id),
+                                                                               
                                         HASH_ENTER,
+                                                                               
                                         &found);
+                               Assert(found == false);
+                               new_local_hash->lbuffer = lbuffer;
+                       }
+                       pfree(indexes);
+               }
+       }PG_CATCH();
+       {
+               hash_destroy(htabNew);
+               pfree(lbuffers_new);
+               PG_RE_THROW();
+       }PG_END_TRY();
+
+       /* update subtrans_local_buffers, subtrans_local_buf_next and 
subtrans_local_buf_end. */
+       subtrans_local_buf_next = &lbuffers_new[subtrans_local_buf_next - 
subtrans_local_buffers];
+       if (subtrans_local_buf_next - lbuffers_new >= newval)
+               subtrans_local_buf_next = lbuffers_new;
+       subtrans_local_buf_end = &lbuffers_new[newval];
+
+       hash_destroy(SubtransLocalBuffHtab);
+       SubtransLocalBuffHtab = htabNew;
+       pfree(subtrans_local_buffers);
+       subtrans_local_buffers = lbuffers_new;
+}
+
+/*
+ * create local subtrans hash table.
+ */
+static void
+SubtransCreateLocalBufHtab(void)
+{
+       StaticAssertStmt(SUBTRANS_XACTS_PER_PAGE <= UINT16_MAX,
+                                        "must change type of 
LocalBufferData::valid_offset");
+
+       Assert(local_cache_subtrans_page_num > 0);
+       SubtransLocalBuffHtab = 
CreateSubtransLocalHtab(local_cache_subtrans_page_num,
+                                                                               
                        &subtrans_local_buffers);
+       subtrans_local_buf_end = 
&subtrans_local_buffers[local_cache_subtrans_page_num];
+       subtrans_local_buf_next = subtrans_local_buffers;
+}
+
+/* According xmin and pageno, calculate the valid off set */
+/* All xids which precedes valid off set in this page, we can get its parent 
xid */
+static inline uint16
+SubtransGetPageValidOffset(int pageno)
+{
+       TransactionId xmin;
+       TransactionId page_xlast;
+       TransactionId page_xfirst;
+
+       /* The xid start and end in this page */
+       page_xfirst = ((TransactionId)pageno) * SUBTRANS_XACTS_PER_PAGE;
+       page_xlast = page_xfirst + (SUBTRANS_XACTS_PER_PAGE - 1);
+
+       xmin = GetSnapshotsXmin();
+       if (xmin == InvalidTransactionId ||
+               TransactionIdPrecedes(xmin, page_xfirst))
+       {
+               return 0;
+       }else if (TransactionIdPrecedes(page_xlast, xmin))
+       {
+               return SUBTRANS_XACTS_PER_PAGE;
+       }
+       return xmin - page_xfirst;
+}
+
+/*
+ * Allocate a local buffer from local buffer array.
+ * If there are unused local buffer left, we can use it directly.
+ * If all local buffer have beed used, pick a victim and reomve it from htab.
+ */
+static LocalBuffer
+SubtransAllocLocalBuffer(void)
+{
+refind_:
+       if(unlikely(subtrans_local_buf_next >= subtrans_local_buf_end))
+               subtrans_local_buf_next = subtrans_local_buffers;
+
+       /* Get a unused local buffer, or get a local buffer which usage_count 
equals 0 */
+       if (subtrans_local_buf_next->in_htab == false ||
+               subtrans_local_buf_next->usage_count == 0)
+       {
+               LocalBuffer result = subtrans_local_buf_next++;
+               if (result->in_htab)
+               {
+                       /* if this local buffer has been used beofre, we should 
remove it from htab. */
+                       LocalBufferHash *item PG_USED_FOR_ASSERTS_ONLY;
+                       item = 
hash_search_with_hash_value(SubtransLocalBuffHtab,
+                                                                               
           &result->page_id,
+                                                                               
           HASH_PAGE_NO(result->page_id),
+                                                                               
           HASH_REMOVE,
+                                                                               
           NULL);
+                       Assert(item && item->lbuffer == result);
+               }
+               result->in_htab = false;
+               return result;
+       }
+
+       /* When usage_count more than SubtransSubUsageCount, minus 
SubtransSubUsageCount. */
+       /* Then asap get a victim page which usage_count equals 0 */
+       if (subtrans_local_buf_next->usage_count > SubtransSubUsageCount)
+               subtrans_local_buf_next->usage_count -= SubtransSubUsageCount;
+       else
+               subtrans_local_buf_next->usage_count = 0;
+       ++subtrans_local_buf_next;
+       goto refind_;
+}
+
+//#define DEBUG_LBUF_UPDATE    1
+#ifdef DEBUG_LBUF_UPDATE
+static void
+DEBUG_LOG_LBUF_UPDATE(LocalBuffer lbuffer, TransactionId xid)
+{
+       static StringInfoData msg = {.data = NULL};
+       int             i;
+
+       if (errstart(LOG, __FILE__, __LINE__, PG_FUNCNAME_MACRO, TEXTDOMAIN))
+       {
+
+               if (unlikely(msg.data == NULL))
+               {
+                       MemoryContext old_contex = 
MemoryContextSwitchTo(TopMemoryContext);
+                       initStringInfo(&msg);
+                       MemoryContextSwitchTo(old_contex);
+               }
+
+               resetStringInfo(&msg);
+               for (i=0;i<lengthof(lbuffer->xids);++i)
+                       appendStringInfo(&msg, " %u", lbuffer->xids[i]);
+
+               errfinish(errmsg("SUBTRANS UPDATE page=%d(%u)%s", 
lbuffer->page_id, xid, msg.data));
+       }
+}
+#else
+#define DEBUG_LOG_LBUF_UPDATE(lbuffer, xid)    ((void)true)
+#endif
+
+/*
+ * According xid and pageid, read subtrans and copy it to local buffer.
+ * According xid and set the valid offset for local page.
+ * valid offset, it means if the transaction id precedes valid offset,
+ * we can get its parend id from local buffer.
+ */
+static inline void
+SubtransSyncLocalBuffer(LocalBuffer lbuffer, TransactionId xid)
+{
+       int             slotno;
+       uint16  valid_offset;
+       Assert(TransactionIdToPage(xid) == lbuffer->page_id);
+
+       valid_offset = SubtransGetPageValidOffset(lbuffer->page_id);
+
+       /* read the subtrans cache and copy to local buffer. */
+       slotno = SimpleLruReadPage_ReadOnly(SubTransCtl, lbuffer->page_id, xid);
+       memcpy(lbuffer->xids, SubTransCtl->shared->page_buffer[slotno], BLCKSZ);
+       LWLockRelease(SubtransSLRULock);
+
+       /* update local buffer off set */
+       if (valid_offset > lbuffer->valid_offset)
+               lbuffer->valid_offset = valid_offset;
+
+       DEBUG_LOG_LBUF_UPDATE(lbuffer, xid);
+}
+
+/*
+ * According pageno and xid, try to get local buffer.
+ * If local cache miss, read subtrans SLRU cache and copy all pge to loal 
buffer. 
+ */
+static inline LocalBuffer
+SubtransReadLocalBuffer(int pageno, TransactionId xid, bool force_load, bool 
*foundPtr)
+{
+       LocalBufferHash *item;
+       LocalBuffer             lbuffer;
+       Assert(SubtransLocalBuffHtab != NULL);
+
+       item = hash_search_with_hash_value(SubtransLocalBuffHtab,
+                                                                          
&pageno,
+                                                                          
HASH_PAGE_NO(pageno),
+                                                                          
HASH_FIND,
+                                                                          
foundPtr);
+       if (item == NULL)
+       {
+               bool                    found;
+
+               if (force_load == false)
+                       return NULL;
+
+               lbuffer = SubtransAllocLocalBuffer();
+               Assert(lbuffer->in_htab == false);
+
+               lbuffer->page_id = pageno;
+               lbuffer->valid_offset = 0;
+               /* sync sutrans SLRU buffer to local buffer */
+               SubtransSyncLocalBuffer(lbuffer, xid);
+
+               /* Add local buffer to htab*/
+               item = hash_search_with_hash_value(SubtransLocalBuffHtab,
+                                                                               
   &pageno,
+                                                                               
   HASH_PAGE_NO(pageno),
+                                                                               
   HASH_ENTER,
+                                                                               
   &found);
+               Assert(found == false);
+               item->lbuffer = lbuffer;
+               lbuffer->in_htab = true;
+               lbuffer->usage_count = 1;
+       }else
+       {
+               lbuffer = item->lbuffer;
+               Assert(lbuffer->page_id == pageno && item->page_id == pageno);
+
+               /* update local buffer usage_count */
+               if (lbuffer->usage_count < SubtransMaxUsageCount)
+                       ++(lbuffer->usage_count);
+       }
 
+       return lbuffer;
+}
+
+/*
+ * Remove the local buffer accoding pageno.
+ */
+void SubtransRemoveLocalPage(int pageno)
+{
+       LocalBuffer             lbuffer;
+       LocalBufferHash *item PG_USED_FOR_ASSERTS_ONLY;
+
+       if (SubtransLocalBuffHtab == NULL)
+               return;
+
+       for (lbuffer=subtrans_local_buffers; lbuffer<subtrans_local_buf_end; 
lbuffer++)
+       {
+               if (lbuffer->in_htab &&
+                       SubTransPagePrecedes(lbuffer->page_id, pageno))
+               {
+                       item = 
hash_search_with_hash_value(SubtransLocalBuffHtab,
+                                                                               
           &lbuffer->page_id,
+                                                                               
           HASH_PAGE_NO(lbuffer->page_id),
+                                                                               
           HASH_REMOVE,
+                                                                               
           NULL);
+                       Assert(item && item->lbuffer == lbuffer);
+                       lbuffer->in_htab = false;
+                       lbuffer->usage_count = 0;
+               }
+       }
+}
 
 /*
  * Record the parent of a subtransaction in the subtrans log.
@@ -77,6 +477,7 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
        int                     entryno = TransactionIdToEntry(xid);
        int                     slotno;
        TransactionId *ptr;
+       LocalBuffer     lbuffer = NULL;
 
        Assert(TransactionIdIsValid(parent));
        Assert(TransactionIdFollows(xid, parent));
@@ -99,7 +500,23 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
                SubTransCtl->shared->page_dirty[slotno] = true;
        }
 
+       if (SubtransLocalBuffHtab)
+       {
+               /* get the local buffer according pageno and xid */
+               lbuffer = SubtransReadLocalBuffer(pageno, xid, false, NULL);
+               if (lbuffer != NULL)
+                       memcpy(lbuffer->xids, 
SubTransCtl->shared->page_buffer[slotno], BLCKSZ);
+       }
+
        LWLockRelease(SubtransSLRULock);
+
+       if (lbuffer != NULL)
+       {
+               Assert(lbuffer->page_id == pageno);
+               /* Update local valid off set */
+               lbuffer->valid_offset = SubtransGetPageValidOffset(pageno);
+               DEBUG_LOG_LBUF_UPDATE(lbuffer, xid);
+       }
 }
 
 /*
@@ -113,6 +530,7 @@ SubTransGetParent(TransactionId xid)
        int                     slotno;
        TransactionId *ptr;
        TransactionId parent;
+       LocalBuffer     lbuffer = NULL;
 
        /* Can't ask about stuff that might not be around anymore */
        Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin));
@@ -121,6 +539,26 @@ SubTransGetParent(TransactionId xid)
        if (!TransactionIdIsNormal(xid))
                return InvalidTransactionId;
 
+       /* try to query local buffer first, if enable local subtrans optimize. 
*/
+       if (local_cache_subtrans_page_num > 0)
+       {
+               bool    found;
+               if (unlikely(SubtransLocalBuffHtab == NULL))
+                       SubtransCreateLocalBufHtab();
+
+               /* get the local buffer from htab or sync from SLRU cache */
+               lbuffer = SubtransReadLocalBuffer(pageno, xid, true, &found);
+               if (found == false ||                                           
                /* from shared memory */
+                       entryno < lbuffer->valid_offset ||                      
        /* or make sure value is valid */
+                       TransactionIdIsValid(lbuffer->xids[entryno]))   /* or 
parent is valid */
+                       return lbuffer->xids[entryno];
+
+               /* sync local buffer from shared memory */
+               SubtransSyncLocalBuffer(lbuffer, xid);
+
+               return lbuffer->xids[entryno];
+       }
+
        /* lock is acquired by SimpleLruReadPage_ReadOnly */
 
        slotno = SimpleLruReadPage_ReadOnly(SubTransCtl, pageno, xid);
@@ -155,6 +593,56 @@ SubTransGetTopmostTransaction(TransactionId xid)
        /* Can't ask about stuff that might not be around anymore */
        Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin));
 
+       /* when enable local subtrans optimize */
+       if (local_cache_subtrans_page_num > 0)
+       {
+               LocalBuffer     lbuffer = NULL;
+               int                     pageno;
+               int                     entryno;
+               bool            found;
+
+               /* Bootstrap and frozen XIDs have no parent */
+               if (!TransactionIdIsNormal(xid))
+                       return InvalidTransactionId;
+
+               if (unlikely(SubtransLocalBuffHtab == NULL))
+                       SubtransCreateLocalBufHtab();
+
+               while (TransactionIdIsValid(parentXid))
+               {
+                       previousXid = parentXid;
+                       if (TransactionIdPrecedes(parentXid, TransactionXmin))
+                               goto end_get_;
+
+                       entryno = TransactionIdToEntry(parentXid);
+                       pageno = TransactionIdToPage(parentXid);
+                       if (lbuffer == NULL ||
+                               lbuffer->page_id != pageno)
+                               lbuffer = SubtransReadLocalBuffer(pageno, 
parentXid, true, &found);
+
+                       Assert(lbuffer->page_id == pageno);
+                       if (found == true &&                                    
                        /* from local cache */
+                               entryno >= lbuffer->valid_offset &&             
                /* not make sure value is valid */
+                               !TransactionIdIsValid(lbuffer->xids[entryno]))  
/* parent is invalid */
+                       {
+                               /* sync from shared memory */
+                               SubtransSyncLocalBuffer(lbuffer, parentXid);
+                       }
+
+                       parentXid = lbuffer->xids[entryno];
+
+                       /*
+                        * By convention the parent xid gets allocated first, 
so should always
+                        * precede the child xid. Anything else points to a 
corrupted data
+                        * structure that could lead to an infinite loop, so 
exit.
+                        */
+                       if (!TransactionIdPrecedes(parentXid, previousXid))
+                               elog(ERROR, "pg_subtrans contains invalid 
entry: xid %u points to parent xid %u",
+                                        previousXid, parentXid);
+               }
+               goto end_get_;
+       }
+
        while (TransactionIdIsValid(parentXid))
        {
                previousXid = parentXid;
@@ -172,6 +660,7 @@ SubTransGetTopmostTransaction(TransactionId xid)
                                 previousXid, parentXid);
        }
 
+end_get_:
        Assert(TransactionIdIsValid(previousXid));
 
        return previousXid;
@@ -184,14 +673,14 @@ SubTransGetTopmostTransaction(TransactionId xid)
 Size
 SUBTRANSShmemSize(void)
 {
-       return SimpleLruShmemSize(NUM_SUBTRANS_BUFFERS, 0);
+       return SimpleLruShmemSize(slru_subtrans_page_num, 0);
 }
 
 void
 SUBTRANSShmemInit(void)
 {
        SubTransCtl->PagePrecedes = SubTransPagePrecedes;
-       SimpleLruInit(SubTransCtl, "Subtrans", NUM_SUBTRANS_BUFFERS, 0,
+       SimpleLruInit(SubTransCtl, "Subtrans", slru_subtrans_page_num, 0,
                                  SubtransSLRULock, "pg_subtrans",
                                  LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE);
        SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
@@ -351,6 +840,7 @@ TruncateSUBTRANS(TransactionId oldestXact)
        cutoffPage = TransactionIdToPage(oldestXact);
 
        SimpleLruTruncate(SubTransCtl, cutoffPage);
+       CacheInvalSubtransPage(cutoffPage);
 }
 
 
diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index 9352c68090..bf49943d7e 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -98,6 +98,7 @@
 
 #include "access/htup_details.h"
 #include "access/xact.h"
+#include "access/subtrans.h"
 #include "catalog/catalog.h"
 #include "catalog/pg_constraint.h"
 #include "miscadmin.h"
@@ -668,10 +669,33 @@ LocalExecuteInvalidationMessage(SharedInvalidationMessage 
*msg)
                else if (msg->sn.dbId == MyDatabaseId)
                        InvalidateCatalogSnapshot();
        }
+       else if (msg->id == SUBTRANS_INVALID_PAGE_ID)
+       {
+               SubtransRemoveLocalPage(msg->spp.pageno);
+       }
        else
                elog(FATAL, "unrecognized SI message ID: %d", msg->id);
 }
 
+/*
+ *             CacheInvalSubtransPage
+ *
+ *             when truncate SUBTRANS SLRU page, send invalid msg to all 
backends.
+ *             Then all backends can remove subtrans from local buffer.
+ */
+void
+CacheInvalSubtransPage(int pageno)
+{
+       SharedInvalidationMessage msg;
+
+       msg.spp.id = SUBTRANS_INVALID_PAGE_ID;
+       msg.spp.pageno = pageno;
+       /* check AddCatcacheInvalidationMessage() for an explanation */
+       VALGRIND_MAKE_MEM_DEFINED(&msg, sizeof(msg));
+
+       SendSharedInvalidMessages(&msg, 1);
+}
+
 /*
  *             InvalidateSystemCaches
  *
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 467b0fd6fe..2979dc8ba4 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -41,6 +41,7 @@
 #include "access/twophase.h"
 #include "access/xact.h"
 #include "access/xlog_internal.h"
+#include "access/subtrans.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
 #include "catalog/storage.h"
@@ -2673,6 +2674,28 @@ static struct config_int ConfigureNamesInt[] =
                NULL, NULL, NULL
        },
 
+       {
+               {"local_cache_subtrans_pages", PGC_SIGHUP, RESOURCES_MEM,
+                       gettext_noop("Optimize subtrans perfermance, local 
cache subtrans page number from SLRU."),
+                       NULL,
+                       GUC_UNIT_BLOCKS
+               },
+               &local_cache_subtrans_page_num,
+               0, 0, UINT32_MAX/(BLCKSZ/sizeof(TransactionId))/2,
+               NULL, SubtransResetGucCacheNum, NULL
+       },
+
+       {
+               {"slru_subtrans_pages", PGC_POSTMASTER, RESOURCES_MEM,
+                       gettext_noop("Sets the number of shared memory buffers 
used for subtrans SLRU."),
+                       NULL,
+                       GUC_UNIT_BLOCKS
+               },
+               &slru_subtrans_page_num,
+               32, 32, INT_MAX / 2,
+               NULL, NULL, NULL
+       },
+
        /*
         * See also CheckRequiredParameterValues() if this parameter changes
         */
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 2968c7f7b7..bf35d2db87 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -901,6 +901,40 @@ xmin_cmp(const pairingheap_node *a, const pairingheap_node 
*b, void *arg)
                return 0;
 }
 
+/*
+ * Get the latest xmin from all snapshot. 
+ * It means all transaction id which precedes xmin has been commited/aborted.
+ */
+TransactionId
+GetSnapshotsXmin(void)
+{
+       TransactionId xmin = TransactionXmin;
+       Snapshot snaps[] = {CurrentSnapshot, FirstXactSnapshot, 
SecondarySnapshot, CatalogSnapshot, HistoricSnapshot};
+       uint32 i;
+
+       ActiveSnapshotElt *asnap = ActiveSnapshot;
+       while (asnap != NULL)
+       {
+               if (xmin == InvalidTransactionId ||
+                       TransactionIdPrecedes(xmin, asnap->as_snap->xmin))
+                       xmin = asnap->as_snap->xmin;
+               asnap = asnap->as_next;
+       }
+
+       for (i=lengthof(snaps); i>0;)
+       {
+               --i;
+               if (snaps[i] != NULL &&
+                       (xmin == InvalidTransactionId ||
+                        TransactionIdPrecedes(xmin, snaps[i]->xmin)))
+               {
+                       xmin = snaps[i]->xmin;
+               }
+       }
+
+       return xmin;
+}
+
 /*
  * SnapshotResetXmin
  *
diff --git a/src/include/access/subtrans.h b/src/include/access/subtrans.h
index d0ab44ae82..0eb9ab3971 100644
--- a/src/include/access/subtrans.h
+++ b/src/include/access/subtrans.h
@@ -11,8 +11,10 @@
 #ifndef SUBTRANS_H
 #define SUBTRANS_H
 
+/* GUC variables */
 /* Number of SLRU buffers to use for subtrans */
-#define NUM_SUBTRANS_BUFFERS   32
+extern int slru_subtrans_page_num;
+extern int local_cache_subtrans_page_num;
 
 extern void SubTransSetParent(TransactionId xid, TransactionId parent);
 extern TransactionId SubTransGetParent(TransactionId xid);
@@ -25,5 +27,6 @@ extern void StartupSUBTRANS(TransactionId oldestActiveXID);
 extern void CheckPointSUBTRANS(void);
 extern void ExtendSUBTRANS(TransactionId newestXact);
 extern void TruncateSUBTRANS(TransactionId oldestXact);
-
+extern void SubtransRemoveLocalPage(int pageno);
+extern void SubtransResetGucCacheNum(int newval, void *extra);
 #endif                                                 /* SUBTRANS_H */
diff --git a/src/include/storage/sinval.h b/src/include/storage/sinval.h
index f03dc23b14..4ac411c536 100644
--- a/src/include/storage/sinval.h
+++ b/src/include/storage/sinval.h
@@ -103,6 +103,14 @@ typedef struct
 
 #define SHAREDINVALSNAPSHOT_ID (-5)
 
+typedef struct
+{
+       int8            id;                             /* type field --- must 
be first */
+       int                     pageno;                 /* page no */
+} SubtransInvalPageMsg;
+
+#define SUBTRANS_INVALID_PAGE_ID       (-6)
+
 typedef struct
 {
        int8            id;                             /* type field --- must 
be first */
@@ -119,6 +127,7 @@ typedef union
        SharedInvalSmgrMsg sm;
        SharedInvalRelmapMsg rm;
        SharedInvalSnapshotMsg sn;
+       SubtransInvalPageMsg spp;
 } SharedInvalidationMessage;
 
 
diff --git a/src/include/utils/inval.h b/src/include/utils/inval.h
index 770672890b..8adb1f37bb 100644
--- a/src/include/utils/inval.h
+++ b/src/include/utils/inval.h
@@ -50,6 +50,8 @@ extern void CacheInvalidateRelcacheByRelid(Oid relid);
 
 extern void CacheInvalidateSmgr(RelFileNodeBackend rnode);
 
+extern void CacheInvalSubtransPage(int pageno);
+
 extern void CacheInvalidateRelmap(Oid databaseId);
 
 extern void CacheRegisterSyscacheCallback(int cacheid,
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 44539fe15a..16df0ea1a2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -125,6 +125,8 @@ extern void UnregisterSnapshot(Snapshot snapshot);
 extern Snapshot RegisterSnapshotOnOwner(Snapshot snapshot, ResourceOwner 
owner);
 extern void UnregisterSnapshotFromOwner(Snapshot snapshot, ResourceOwner 
owner);
 
+extern TransactionId GetSnapshotsXmin(void);
+
 extern void AtSubCommit_Snapshot(int level);
 extern void AtSubAbort_Snapshot(int level);
 extern void AtEOXact_Snapshot(bool isCommit, bool resetXmin);

Reply via email to