Thank you both for reviewing!

On Tue, Jul 4, 2023 at 4:24AM Alvaro Herrera <alvhe...@alvh.no-ip.org> wrote:

> Hmm, yeah, I remember being bit bothered by this repeated
> initialization. Your patch looks reasonable to me. I would set
> bistate->bs_rmAccess to NULL in the cleanup callback, just to be sure.
> Also, please add comments atop these two new functions, to explain what
> they are.

Done. Set bistate->bs_desc = NULL; as well. Added comments.


On Tue, Jul 4, 2023 at 4:59AM Tomas Vondra
<tomas.von...@enterprisedb.com> wrote:

> Yeah. I wonder how much of that runtime is the generate_series(),
> though. What's the speedup if that part is subtracted. It's guaranteed
> to be even more significant, but by how much?

When trying COPY, I got tripped by the following:

We get a buffer leak WARNING for the meta page and a revmap page.

WARNING:  buffer refcount leak: [094] (rel=base/156912/206068,
blockNum=1, flags=0x83000000, refcount=1 1)
WARNING:  buffer refcount leak: [093] (rel=base/156912/206068,
blockNum=0, flags=0x83000000, refcount=1 1)

PrintBufferLeakWarning bufmgr.c:3240
ResourceOwnerReleaseInternal resowner.c:554
ResourceOwnerRelease resowner.c:494
PortalDrop portalmem.c:563
exec_simple_query postgres.c:1284

We release the buffer during this resowner release and then we crash
with:

TRAP: failed Assert("bufnum <= NBuffers"), File:
"../../../../src/include/storage/bufmgr.h", Line: 305, PID: 86833
postgres: pivotal test4 [local] COPY(ExceptionalCondition+0xbb)[0x5572b55bcc79]
postgres: pivotal test4 [local] COPY(+0x61ccfc)[0x5572b537dcfc]
postgres: pivotal test4 [local] COPY(ReleaseBuffer+0x19)[0x5572b5384db2]
postgres: pivotal test4 [local] COPY(brinRevmapTerminate+0x1e)[0x5572b4e3fd39]
postgres: pivotal test4 [local] COPY(+0xcfc44)[0x5572b4e30c44]
postgres: pivotal test4 [local] COPY(+0x89e7f2)[0x5572b55ff7f2]
postgres: pivotal test4 [local] COPY(MemoryContextDelete+0xd7)[0x5572b55ff683]
postgres: pivotal test4 [local] COPY(PortalDrop+0x374)[0x5572b5602dc7]

Unfortunately, when we do COPY, the MemoryContext where makeIndexInfo
gets called is PortalContext and that is what is set in ii_Context.
Furthermore, we clean up the resource owner stuff before we can clean
up the MemoryContexts in PortalDrop().

The CurrentMemoryContext when initialize_brin_insertstate() is called
depends. For CopyMultiInsertBufferFlush() -> ExecInsertIndexTuples()
it is PortalContext, and for CopyFrom() -> ExecInsertIndexTuples() it is
ExecutorState/ExprContext. We can't rely on it to register the callback
neither.

What we can do is create a new MemoryContext for holding the
BrinInsertState, and we tie the callback to that so that cleanup is not
affected by all of these variables. See v2 patch attached. Passes make
installcheck-world and make installcheck -C src/test/modules/brin.

However, we do still have 1 issue with the v2 patch:
When we try to cancel (Ctrl-c) a running COPY command:
ERROR:  buffer 151 is not owned by resource owner TopTransaction

#4  0x0000559cbc54a934 in ResourceOwnerForgetBuffer
(owner=0x559cbd6fcf28, buffer=143) at resowner.c:997
#5  0x0000559cbc2c45e7 in UnpinBuffer (buf=0x7f8d4a8f3f80) at bufmgr.c:2390
#6  0x0000559cbc2c7e49 in ReleaseBuffer (buffer=143) at bufmgr.c:4488
#7  0x0000559cbbd82d53 in brinRevmapTerminate (revmap=0x559cbd7a03b8)
at brin_revmap.c:105
#8  0x0000559cbbd73c44 in brininsertCleanupCallback
(arg=0x559cbd7a5b68) at brin.c:168
#9  0x0000559cbc54280c in MemoryContextCallResetCallbacks
(context=0x559cbd7a5a50) at mcxt.c:506
#10 0x0000559cbc54269d in MemoryContextDelete (context=0x559cbd7a5a50)
at mcxt.c:421
#11 0x0000559cbc54273e in MemoryContextDeleteChildren
(context=0x559cbd69ae90) at mcxt.c:457
#12 0x0000559cbc54625c in AtAbort_Portals () at portalmem.c:850

Haven't found a way to fix this ^ yet.

Maybe there is a better way of doing our cleanup? I'm not sure. Would
love your input!

The other alternative for all this is to introduce new AM callbacks for
insert_begin and insert_end. That might be a tougher sell?

Now, to finally answer your question about the speedup without
generate_series(). We do see an even higher speedup!

seq 1 200000000 > /tmp/data.csv
\timing
DROP TABLE heap;
CREATE TABLE heap(i int);
CREATE INDEX ON heap USING brin(i) WITH (pages_per_range=1);
COPY heap FROM '/tmp/data.csv';

-- 3 runs (master 29cf61ade3f245aa40f427a1d6345287ef77e622)
COPY 200000000
Time: 205072.444 ms (03:25.072)
Time: 215380.369 ms (03:35.380)
Time: 203492.347 ms (03:23.492)

-- 3 runs (branch v2)

COPY 200000000
Time: 135052.752 ms (02:15.053)
Time: 135093.131 ms (02:15.093)
Time: 138737.048 ms (02:18.737)

Regards,
Soumyadeep (VMware)
From 54ba134f4afe9c4bac19e7d8fde31b9768dc23cd Mon Sep 17 00:00:00 2001
From: Soumyadeep Chakraborty <soumyadeep2...@gmail.com>
Date: Tue, 4 Jul 2023 11:50:35 -0700
Subject: [PATCH v2 1/1] Reuse revmap and brin desc in brininsert

brininsert() used to have code that performed per-tuple initialization
of the revmap. That had some overhead.
---
 src/backend/access/brin/brin.c | 89 +++++++++++++++++++++++++++++-----
 1 file changed, 78 insertions(+), 11 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 3c6a956eaa3..32b588af4da 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -58,6 +58,17 @@ typedef struct BrinBuildState
 	BrinMemTuple *bs_dtuple;
 } BrinBuildState;
 
+/*
+ * We use a BrinInsertState to capture running state spanning multiple
+ * brininsert invocations, within the same command.
+ */
+typedef struct BrinInsertState
+{
+	BrinRevmap *bs_rmAccess;
+	BrinDesc   *bs_desc;
+	BlockNumber bs_pages_per_range;
+}			BrinInsertState;
+
 /*
  * Struct used as "opaque" during index scans
  */
@@ -72,6 +83,8 @@ typedef struct BrinOpaque
 
 static BrinBuildState *initialize_brin_buildstate(Relation idxRel,
 												  BrinRevmap *revmap, BlockNumber pagesPerRange);
+static void brininsertCleanupCallback(void *arg);
+static BrinInsertState *initialize_brin_insertstate(Relation idxRel);
 static void terminate_brin_buildstate(BrinBuildState *state);
 static void brinsummarize(Relation index, Relation heapRel, BlockNumber pageRange,
 						  bool include_partial, double *numSummarized, double *numExisting);
@@ -140,6 +153,60 @@ brinhandler(PG_FUNCTION_ARGS)
 	PG_RETURN_POINTER(amroutine);
 }
 
+/*
+ * Callback to clean up the BrinInsertState once all tuple inserts are done.
+ */
+static void
+brininsertCleanupCallback(void *arg)
+{
+	BrinInsertState *bistate = (BrinInsertState *) arg;
+
+	/*
+	 * Clean up the revmap. Note that the brinDesc has already been cleaned up
+	 * as part of its own memory context.
+	 */
+	brinRevmapTerminate(bistate->bs_rmAccess);
+	bistate->bs_rmAccess = NULL;
+	bistate->bs_desc = NULL;
+}
+
+/*
+ * Initialize a BrinInsertState to maintain state to be used across multiple
+ * tuple inserts, within the same command.
+ */
+static BrinInsertState *
+initialize_brin_insertstate(Relation idxRel)
+{
+	BrinInsertState *bistate;
+	MemoryContextCallback *cb;
+	MemoryContext cxt;
+	MemoryContext oldcxt;
+
+	/*
+	 * Create private context for holding the BrinInsertState to ensure that we
+	 * clean up the revmap safely in the callback.
+	 */
+	cxt = AllocSetContextCreate(CurrentMemoryContext,
+								"brin insert cxt",
+								ALLOCSET_SMALL_SIZES);
+	oldcxt = MemoryContextSwitchTo(cxt);
+
+	bistate = palloc0(sizeof(BrinInsertState));
+	bistate->bs_desc = brin_build_desc(idxRel);
+	cb = palloc(sizeof(MemoryContextCallback));
+	cb->arg = bistate;
+	cb->func = brininsertCleanupCallback;
+	MemoryContextRegisterResetCallback(CurrentMemoryContext, cb);
+
+	MemoryContextSwitchTo(oldcxt);
+
+	bistate->bs_rmAccess = brinRevmapInitialize(idxRel,
+												&bistate->bs_pages_per_range,
+												NULL);
+
+	return bistate;
+}
+
 /*
  * A tuple in the heap is being inserted.  To keep a brin index up to date,
  * we need to obtain the relevant index tuple and compare its stored values
@@ -162,14 +229,23 @@ brininsert(Relation idxRel, Datum *values, bool *nulls,
 	BlockNumber pagesPerRange;
 	BlockNumber origHeapBlk;
 	BlockNumber heapBlk;
-	BrinDesc   *bdesc = (BrinDesc *) indexInfo->ii_AmCache;
+	BrinInsertState *bistate = (BrinInsertState *) indexInfo->ii_AmCache;
 	BrinRevmap *revmap;
+	BrinDesc   *bdesc;
 	Buffer		buf = InvalidBuffer;
 	MemoryContext tupcxt = NULL;
 	MemoryContext oldcxt = CurrentMemoryContext;
 	bool		autosummarize = BrinGetAutoSummarize(idxRel);
 
-	revmap = brinRevmapInitialize(idxRel, &pagesPerRange, NULL);
+	if (!bistate)
+	{
+		/* First time through in this statement? */
+		bistate = initialize_brin_insertstate(idxRel);
+		indexInfo->ii_AmCache = (void *) bistate;
+	}
+	revmap = bistate->bs_rmAccess;
+	bdesc = bistate->bs_desc;
+	pagesPerRange = bistate->bs_pages_per_range;
 
 	/*
 	 * origHeapBlk is the block number where the insertion occurred.  heapBlk
@@ -228,14 +304,6 @@ brininsert(Relation idxRel, Datum *values, bool *nulls,
 		if (!brtup)
 			break;
 
-		/* First time through in this statement? */
-		if (bdesc == NULL)
-		{
-			MemoryContextSwitchTo(indexInfo->ii_Context);
-			bdesc = brin_build_desc(idxRel);
-			indexInfo->ii_AmCache = (void *) bdesc;
-			MemoryContextSwitchTo(oldcxt);
-		}
 		/* First time through in this brininsert call? */
 		if (tupcxt == NULL)
 		{
@@ -306,7 +374,6 @@ brininsert(Relation idxRel, Datum *values, bool *nulls,
 		break;
 	}
 
-	brinRevmapTerminate(revmap);
 	if (BufferIsValid(buf))
 		ReleaseBuffer(buf);
 	MemoryContextSwitchTo(oldcxt);
-- 
2.34.1

Attachment: perf_diff_v2.out
Description: Binary data

Reply via email to