Re: Performance degradation on concurrent COPY into a single relation in PG16.

Andres Freund Fri, 13 Oct 2023 11:30:55 -0700

Hi,

On 2023-10-13 10:39:10 -0700, Andres Freund wrote:
> On 2023-10-12 09:24:19 -0700, Andres Freund wrote:
> > I kind of had hoped somebody would comment on the approach.  Given that 
> > nobody
> > has, I'll push the minimal fix of resetting the state in
> > ReleaseBulkInsertStatePin(), even though I think architecturally that's not
> > great.
> 
> I spent some time working on a test that shows the problem more cheaply than
> the case upthread. I think it'd be desirable to have a test that's likely to
> catch an issue like this fairly quickly. We've had other problems in this
> realm before - there's only a single test that fails if I remove the
> ReleaseBulkInsertStatePin() call, and I don't think that's guaranteed at all.
> 
> I'm a bit on the fence on how large to make the relation. For me the bug
> triggers when filling both relations up to the 3rd block, but how many rows
> that takes is somewhat dependent on space utilization on the page and stuff.
> 
> Right now the test uses data/desc.data and ends up with 328kB and 312kB in two
> partitions. Alternatively I could make the test create a new file to load with
> copy that has fewer rows than data/desc.data - I didn't see another data file
> that works conveniently and has fewer rows.  The copy is reasonably fast, even
> under something as expensive as rr (~60ms). So I'm inclined to just go with
> that?


Patch with fix and test attached (0001).

Given how easy a missing ReleaseBulkInsertStatePin() can cause corruption (not
due to this bug, but the issue fixed in b1ecb9b3fcf), I think we should
consider adding an assertion along the lines of 0002 to HEAD. Perhaps adding a
new bufmgr.c function to avoid having to get the fields in the buffer tag we
don't care about. Or perhaps we should just promote the check to an elog, we
already call BufferGetBlockNumber(), using BufferGetTag() instead doesn't cost
much more.

Greetings,

Andres Freund

>From e202cf5c3cae0551ae8fabd2629b4f9c6a0734d5 Mon Sep 17 00:00:00 2001
From: Andres Freund <[email protected]>
Date: Fri, 13 Oct 2023 10:54:16 -0700
Subject: [PATCH v1 1/2] Fix bulk table extension when copying into multiple
 partitions

When COPYing into a partitioned table that does now permit the use of
table_multi_insert(), we could error out with
  ERROR: could not read block NN in file "base/...": read only 0 of 8192 bytes

because BulkInsertState->next_free was not reset between partitions. This
problem occured only when not able to use table_multi_insert(), as a dedicated
BulkInsertState for each partition is used in that case.

The bug was introduced in 00d1e02be24, but it was hard to hit at that point as
commonly bulk relation extension is not used when not using
table_multi_insert(). It became more likely after 82a4edabd27, which expanded
the use of bulk extension.

To fix the bug, reset the bulk relation extension state in BulkInsertState in
ReleaseBulkInsertStatePin(). That was added (in b1ecb9b3fcf) to tackle a very
similar issue.  Obviously the name is not quite correct, but there might be
external callers, and bulk insert state needs to be reset in precisely in the
situations that ReleaseBulkInsertStatePin() already needed to be called.

Medium term the better fix likely is to disallow reusing BulkInsertState
across relations.

Add a test that, without the fix, reproduces #18130 in most
configurations. The test also catches the problem fixed in b1ecb9b3fcf when
run with small shared_buffers.

Reported-by: Ivan Kolombet <[email protected]>
Analyzed-by: Tom Lane <[email protected]>
Analyzed-by: Andres Freund <[email protected]>
Bug: #18130
Discussion: https://postgr.es/m/18130-7a86a7356a75209d%40postgresql.org
Discussion: https://postgr.es/m/257696.1695670946%40sss.pgh.pa.us
Backpatch: 16-
---
 src/backend/access/heap/heapam.c   | 11 +++++++++
 src/test/regress/expected/copy.out | 37 ++++++++++++++++++++++++++++++
 src/test/regress/sql/copy.sql      | 37 ++++++++++++++++++++++++++++++
 3 files changed, 85 insertions(+)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 88a123d38a6..51eceaca6cf 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1792,6 +1792,17 @@ ReleaseBulkInsertStatePin(BulkInsertState bistate)
 	if (bistate->current_buf != InvalidBuffer)
 		ReleaseBuffer(bistate->current_buf);
 	bistate->current_buf = InvalidBuffer;
+
+	/*
+	 * Despite the name, we also release information about bulk
+	 * inserts. Otherwise we can end up erroring out due to looking for free
+	 * space in ->next_free of one partition, even though ->next_free was set
+	 * when extending another partition. It's obviously also could be bad for
+	 * efficiency to look at existing blocks at offsets from another
+	 * partition, even if we don't error out.
+	 */
+	bistate->next_free = InvalidBlockNumber;
+	bistate->last_free = InvalidBlockNumber;
 }
 
 
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index a5912c13a8c..b48365ec981 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -257,3 +257,40 @@ ERROR:  value too long for type character varying(5)
 \.
 invalid command \.
 drop table oversized_column_default;
+--
+-- Create partitioned table that does not allow bulk insertions, to test bugs
+-- related to the reuse of BulkInsertState across partitions (only done when
+-- not using bulk insert).  Switching between partitions often makes it more
+-- likely to encounter these bugs, so we just switch on roughly every insert
+-- by having an even/odd number partition and inserting evenly distributed
+-- data.
+--
+CREATE TABLE parted_si (
+  id int not null,
+  data text not null,
+  -- prevent use of bulk insert by having a volatile function
+  rand float8 not null default random()
+)
+PARTITION BY LIST((id % 2));
+CREATE TABLE parted_si_p_even PARTITION OF parted_si FOR VALUES IN (0);
+CREATE TABLE parted_si_p_odd PARTITION OF parted_si FOR VALUES IN (1);
+-- Test that bulk relation extension handles reusing a single BulkInsertState
+-- across partitions.  Without the fix applied, this reliably reproduces
+-- #18130 unless shared_buffers is extremely small (preventing any use use of
+-- bulk relation extension). See
+-- https://postgr.es/m/18130-7a86a7356a75209d%40postgresql.org
+-- https://postgr.es/m/257696.1695670946%40sss.pgh.pa.us
+\set filename :abs_srcdir '/data/desc.data'
+COPY parted_si(id, data) FROM :'filename';
+-- An earlier bug (see commit b1ecb9b3fcf) could end up using a buffer from
+-- the wrong partition. This test is *not* guaranteed to trigger that bug, but
+-- does so when shared_buffers is small enough.  To test if we encountered the
+-- bug, check that the partition condition isn't violated.
+SELECT tableoid::regclass, id % 2 = 0 is_even, count(*) from parted_si GROUP BY 1, 2 ORDER BY 1;
+     tableoid     | is_even | count 
+------------------+---------+-------
+ parted_si_p_even | t       |  5000
+ parted_si_p_odd  | f       |  5000
+(2 rows)
+
+DROP TABLE parted_si;
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 7fdb26d14f3..43d2e906dd9 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -283,3 +283,40 @@ copy oversized_column_default (col2) from stdin;
 copy oversized_column_default from stdin (default '');
 \.
 drop table oversized_column_default;
+
+
+--
+-- Create partitioned table that does not allow bulk insertions, to test bugs
+-- related to the reuse of BulkInsertState across partitions (only done when
+-- not using bulk insert).  Switching between partitions often makes it more
+-- likely to encounter these bugs, so we just switch on roughly every insert
+-- by having an even/odd number partition and inserting evenly distributed
+-- data.
+--
+CREATE TABLE parted_si (
+  id int not null,
+  data text not null,
+  -- prevent use of bulk insert by having a volatile function
+  rand float8 not null default random()
+)
+PARTITION BY LIST((id % 2));
+
+CREATE TABLE parted_si_p_even PARTITION OF parted_si FOR VALUES IN (0);
+CREATE TABLE parted_si_p_odd PARTITION OF parted_si FOR VALUES IN (1);
+
+-- Test that bulk relation extension handles reusing a single BulkInsertState
+-- across partitions.  Without the fix applied, this reliably reproduces
+-- #18130 unless shared_buffers is extremely small (preventing any use use of
+-- bulk relation extension). See
+-- https://postgr.es/m/18130-7a86a7356a75209d%40postgresql.org
+-- https://postgr.es/m/257696.1695670946%40sss.pgh.pa.us
+\set filename :abs_srcdir '/data/desc.data'
+COPY parted_si(id, data) FROM :'filename';
+
+-- An earlier bug (see commit b1ecb9b3fcf) could end up using a buffer from
+-- the wrong partition. This test is *not* guaranteed to trigger that bug, but
+-- does so when shared_buffers is small enough.  To test if we encountered the
+-- bug, check that the partition condition isn't violated.
+SELECT tableoid::regclass, id % 2 = 0 is_even, count(*) from parted_si GROUP BY 1, 2 ORDER BY 1;
+
+DROP TABLE parted_si;
-- 
2.38.0

>From 9297c5f00ff3230e5faf1c7dad5de7d2d5868ec9 Mon Sep 17 00:00:00 2001
From: Andres Freund <[email protected]>
Date: Fri, 13 Oct 2023 11:23:16 -0700
Subject: [PATCH v1 2/2] Assert buffer in ReadBufferBI() is for correct
 relation

Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch:
---
 src/backend/access/heap/hio.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index caa62708aa5..ed9f72379ba 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -99,6 +99,17 @@ ReadBufferBI(Relation relation, BlockNumber targetBlock,
 	/* If we have the desired block already pinned, re-pin and return it */
 	if (bistate->current_buf != InvalidBuffer)
 	{
+#ifdef USE_ASSERT_CHECKING
+		{
+			RelFileLocator rlocator;
+			ForkNumber forknum;
+			BlockNumber blknum;
+
+			BufferGetTag(bistate->current_buf, &rlocator, &forknum, &blknum);
+			Assert(RelFileLocatorEquals(rlocator, relation->rd_locator));
+		}
+#endif
+
 		if (BufferGetBlockNumber(bistate->current_buf) == targetBlock)
 		{
 			/*
-- 
2.38.0

Re: Performance degradation on concurrent COPY into a single relation in PG16.

Reply via email to