https://postgr.es/m/[email protected] wrote:
> Separable, nontrivial things not fixed in the attached patch stack:
> - Trouble is possible, I bet, if the system crashes between the inplace-update
> memcpy() and XLogInsert(). See the new XXX comment below the memcpy().
That comment:
/*----------
* XXX A crash here can allow datfrozenxid() to get ahead of
relfrozenxid:
*
* ["D" is a VACUUM (ONLY_DATABASE_STATS)]
* ["R" is a VACUUM tbl]
* D: vac_update_datfrozenid() -> systable_beginscan(pg_class)
* D: systable_getnext() returns pg_class tuple of tbl
* R: memcpy() into pg_class tuple of tbl
* D: raise pg_database.datfrozenxid, XLogInsert(), finish
* [crash]
* [recovery restores datfrozenxid w/o relfrozenxid]
*/
> Might solve this by inplace update setting DELAY_CHKPT, writing WAL, and
> finally issuing memcpy() into the buffer.
That fix worked. Along with that, I'm attaching a not-for-commit patch with a
test case and one with the fix rebased on that test case. Apply on top of the
v2 patch stack from https://postgr.es/m/[email protected].
This gets key testing from 027_stream_regress.pl; when I commented out some
memcpy lines of the heapam.c change, that test caught it.
This resolves the last inplace update defect known to me.
Thanks,
nm
Author: Noah Misch <[email protected]>
Commit: Noah Misch <[email protected]>
WAL-log inplace update before revealing it to other sessions.
A buffer lock won't stop a reader having already checked tuple
visibility. If a vac_update_datfrozenid() and then a crash happened
during inplace update of a relfrozenxid value, datfrozenxid could
overtake relfrozenxid. Back-patch to v12 (all supported versions).
Reviewed by FIXME.
Discussion: https://postgr.es/m/FIXME
diff --git a/src/backend/access/heap/README.tuplock
b/src/backend/access/heap/README.tuplock
index fb06ff2..aec8dcc 100644
--- a/src/backend/access/heap/README.tuplock
+++ b/src/backend/access/heap/README.tuplock
@@ -201,6 +201,4 @@ Inplace updates create an exception to the rule that tuple
data won't change
under a reader holding a pin. A reader of a heap_fetch() result tuple may
witness a torn read. Current inplace-updated fields are aligned and are no
wider than four bytes, and current readers don't need consistency across
-fields. Hence, they get by with just fetching each field once. XXX such a
-caller may also read a value that has not reached WAL; see
-heap_inplace_update_finish().
+fields. Hence, they get by with just fetching each field once.
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d7e417f..2a5fea5 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6305,6 +6305,8 @@ heap_inplace_update_finish(void *state, HeapTuple tuple)
Relation relation = scan->heap_rel;
uint32 oldlen;
uint32 newlen;
+ char *dst;
+ char *src;
int nmsgs = 0;
SharedInvalidationMessage *invalMessages = NULL;
bool RelcacheInitFileInval = false;
@@ -6315,6 +6317,9 @@ heap_inplace_update_finish(void *state, HeapTuple tuple)
if (oldlen != newlen || htup->t_hoff != tuple->t_data->t_hoff)
elog(ERROR, "wrong tuple length");
+ dst = (char *) htup + htup->t_hoff;
+ src = (char *) tuple->t_data + tuple->t_data->t_hoff;
+
/*
* Construct shared cache inval if necessary. Note that because we only
* pass the new version of the tuple, this mustn't be used for any
@@ -6338,15 +6343,15 @@ heap_inplace_update_finish(void *state, HeapTuple tuple)
*/
PreInplace_Inval();
- /* NO EREPORT(ERROR) from here till changes are logged */
- START_CRIT_SECTION();
-
- memcpy((char *) htup + htup->t_hoff,
- (char *) tuple->t_data + tuple->t_data->t_hoff,
- newlen);
-
/*----------
- * XXX A crash here can allow datfrozenxid() to get ahead of
relfrozenxid:
+ * NO EREPORT(ERROR) from here till changes are complete
+ *
+ * Our buffer lock won't stop a reader having already pinned and checked
+ * visibility for this tuple. Hence, we write WAL first, then mutate
the
+ * buffer. Like in MarkBufferDirtyHint() or RecordTransactionCommit(),
+ * checkpoint delay makes that acceptable. With the usual order of
+ * changes, a crash after memcpy() and before XLogInsert() could allow
+ * datfrozenxid to overtake relfrozenxid:
*
* ["D" is a VACUUM (ONLY_DATABASE_STATS)]
* ["R" is a VACUUM tbl]
@@ -6356,14 +6361,28 @@ heap_inplace_update_finish(void *state, HeapTuple tuple)
* D: raise pg_database.datfrozenxid, XLogInsert(), finish
* [crash]
* [recovery restores datfrozenxid w/o relfrozenxid]
+ *
+ * Like in MarkBufferDirtyHint() subroutine XLogSaveBufferForHint(),
copy
+ * the buffer to the stack before logging. Here, that facilitates a FPI
+ * of the post-mutation block before we accept other sessions seeing it.
*/
-
- MarkBufferDirty(buffer);
+ Assert((MyProc->delayChkptFlags & DELAY_CHKPT_START) == 0);
+ START_CRIT_SECTION();
+ MyProc->delayChkptFlags |= DELAY_CHKPT_START;
/* XLOG stuff */
if (RelationNeedsWAL(relation))
{
xl_heap_inplace xlrec;
+ PGAlignedBlock copied_buffer;
+ char *origdata = (char *) BufferGetBlock(buffer);
+ Page page = BufferGetPage(buffer);
+ uint16 lower = ((PageHeader) page)->pd_lower;
+ uint16 upper = ((PageHeader) page)->pd_upper;
+ uintptr_t dst_offset_in_block;
+ RelFileLocator rlocator;
+ ForkNumber forkno;
+ BlockNumber blkno;
XLogRecPtr recptr;
xlrec.offnum = ItemPointerGetOffsetNumber(&tuple->t_self);
@@ -6378,16 +6397,28 @@ heap_inplace_update_finish(void *state, HeapTuple tuple)
XLogRegisterData((char *) invalMessages,
nmsgs *
sizeof(SharedInvalidationMessage));
- XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
- XLogRegisterBufData(0, (char *) htup + htup->t_hoff, newlen);
+ /* register block matching what buffer will look like after
changes */
+ memcpy(copied_buffer.data, origdata, lower);
+ memcpy(copied_buffer.data + upper, origdata + upper, BLCKSZ -
upper);
+ dst_offset_in_block = dst - origdata;
+ memcpy(copied_buffer.data + dst_offset_in_block, src, newlen);
+ BufferGetTag(buffer, &rlocator, &forkno, &blkno);
+ Assert(forkno == MAIN_FORKNUM);
+ XLogRegisterBlock(0, &rlocator, forkno, blkno,
copied_buffer.data,
+ REGBUF_STANDARD);
+ XLogRegisterBufData(0, src, newlen);
/* inplace updates aren't decoded atm, don't log the origin */
recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_INPLACE);
- PageSetLSN(BufferGetPage(buffer), recptr);
+ PageSetLSN(page, recptr);
}
+ memcpy(dst, src, newlen);
+
+ MarkBufferDirty(buffer);
+
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
/*
@@ -6400,6 +6431,7 @@ heap_inplace_update_finish(void *state, HeapTuple tuple)
*/
AtInplace_Inval();
+ MyProc->delayChkptFlags &= ~DELAY_CHKPT_START;
END_CRIT_SECTION();
UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
systable_endscan(scan);
Author: Noah Misch <[email protected]>
Commit: Noah Misch <[email protected]>
[not for commit] demonstrate datfrozenxid overtaking relfrozenxid
Components that might be separate patches if committing:
- Allow injection points in critical sections
- Emit DEBUG1 before waiting on injection point, so TAP test can poll for it
- BackgroundPsql: add feature to wait for stderr content, not just stdout
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d7e417f..c4d18ad 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6344,6 +6344,7 @@ heap_inplace_update_finish(void *state, HeapTuple tuple)
memcpy((char *) htup + htup->t_hoff,
(char *) tuple->t_data + tuple->t_data->t_hoff,
newlen);
+ INJECTION_POINT("inplace-after-mempcy");
/*----------
* XXX A crash here can allow datfrozenxid() to get ahead of
relfrozenxid:
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index d299a25..c4020e3 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -37,6 +37,7 @@
#include "catalog/index.h"
#include "catalog/namespace.h"
#include "catalog/pg_database.h"
+#include "catalog/pg_enum_d.h"
#include "catalog/pg_inherits.h"
#include "commands/cluster.h"
#include "commands/defrem.h"
@@ -56,6 +57,7 @@
#include "utils/fmgroids.h"
#include "utils/guc.h"
#include "utils/guc_hooks.h"
+#include "utils/injection_point.h"
#include "utils/memutils.h"
#include "utils/snapmgr.h"
#include "utils/syscache.h"
@@ -1575,7 +1577,7 @@ vac_update_datfrozenxid(void)
{
HeapTuple tuple;
Form_pg_database dbform;
- Relation relation;
+ Relation relation, dbrelation;
SysScanDesc scan;
HeapTuple classTup;
TransactionId newFrozenXid;
@@ -1629,11 +1631,24 @@ vac_update_datfrozenxid(void)
scan = systable_beginscan(relation, InvalidOid, false,
NULL, 0, NULL);
+ /*
+ * This may process invals that need pg_class buffer locks, so get it
out
+ * of the way.
+ */
+ dbrelation = table_open(DatabaseRelationId, RowExclusiveLock);
+
while ((classTup = systable_getnext(scan)) != NULL)
{
volatile FormData_pg_class *classForm = (Form_pg_class)
GETSTRUCT(classTup);
- TransactionId relfrozenxid = classForm->relfrozenxid;
- TransactionId relminmxid = classForm->relminmxid;
+ TransactionId relfrozenxid;
+ TransactionId relminmxid;
+
+#ifdef USE_INJECTION_POINTS
+ if (classForm->oid == EnumRelationId)
+
INJECTION_POINT("update_datfrozenxid-before-fetch-relfrozenxid");
+#endif
+ relfrozenxid = classForm->relfrozenxid;
+ relminmxid = classForm->relminmxid;
/*
* Only consider relations able to hold unfrozen XIDs (anything
else
@@ -1706,7 +1721,8 @@ vac_update_datfrozenxid(void)
Assert(MultiXactIdIsValid(newMinMulti));
/* Now fetch the pg_database tuple we need to update. */
- relation = table_open(DatabaseRelationId, RowExclusiveLock);
+ /* relation = table_open(DatabaseRelationId, RowExclusiveLock); */
+ relation = dbrelation;
/*
* Fetch a copy of the tuple to scribble on. We could check the
syscache
diff --git a/src/backend/utils/misc/injection_point.c
b/src/backend/utils/misc/injection_point.c
index 5c2a0d2..64720f3 100644
--- a/src/backend/utils/misc/injection_point.c
+++ b/src/backend/utils/misc/injection_point.c
@@ -291,11 +291,18 @@ void
InjectionPointRun(const char *name)
{
#ifdef USE_INJECTION_POINTS
+ bool reset_allow = false;
InjectionPointEntry *entry_by_name;
bool found;
InjectionPointCallback injection_callback;
const void *private_data;
+ if (CritSectionCount > 0 && !MemoryContextAllowAllInCriticalSection)
+ {
+ reset_allow = true;
+ MemoryContextAllowAllInCriticalSection = true;
+ }
+
LWLockAcquire(InjectionPointLock, LW_SHARED);
entry_by_name = (InjectionPointEntry *)
hash_search(InjectionPointHash, name,
@@ -309,7 +316,7 @@ InjectionPointRun(const char *name)
if (!found)
{
injection_point_cache_remove(name);
- return;
+ goto out;
}
/*
@@ -344,6 +351,11 @@ InjectionPointRun(const char *name)
/* Now loaded, so get it. */
injection_callback = injection_point_cache_get(name, &private_data);
injection_callback(name, private_data);
+
+out:
+ /* elog(ERROR) would have become PANIC, so we never miss this reset */
+ if (reset_allow)
+ MemoryContextAllowAllInCriticalSection = false;
#else
elog(ERROR, "Injection points are not supported by this build");
#endif
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index b42ecec..dda3dcb 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -157,6 +157,17 @@ MemoryContext CurTransactionContext = NULL;
/* This is a transient link to the active portal's memory context: */
MemoryContext PortalContext = NULL;
+/*
+ * If true, suppress all assertions about allocations in critical sections.
+ * Behave as though MemoryContextAllowInCriticalSection() had been called for
+ * every context, and permit new contexts. Caller is responsible for
+ * restoring value on error. The intended use case is debugging aids that
+ * can't use a single-purpose context. For example, LockAcquire() allocates
+ * in TopMemoryContext, so a debugging aid that calls it has no clean way to
+ * redirect that allocation.
+ */
+bool MemoryContextAllowAllInCriticalSection = false;
+
static void MemoryContextDeleteOnly(MemoryContext context);
static void MemoryContextCallResetCallbacks(MemoryContext context);
static void MemoryContextStatsInternal(MemoryContext context, int level,
@@ -173,7 +184,8 @@ static void MemoryContextStatsPrint(MemoryContext context,
void *passthru,
* rule, the allocation functions Assert that.
*/
#define AssertNotInCriticalSection(context) \
- Assert(CritSectionCount == 0 || (context)->allowInCritSection)
+ Assert(CritSectionCount == 0 || (context)->allowInCritSection || \
+ MemoryContextAllowAllInCriticalSection)
/*
* Call the given function in the MemoryContextMethods for the memory context
@@ -1103,8 +1115,7 @@ MemoryContextCreate(MemoryContext node,
MemoryContext parent,
const char *name)
{
- /* Creating new memory contexts is not allowed in a critical section */
- Assert(CritSectionCount == 0);
+ Assert(CritSectionCount == 0 || MemoryContextAllowAllInCriticalSection);
/* Initialize all standard fields of memory context header */
node->type = tag;
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index cd9596f..9889225 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -93,6 +93,8 @@ extern void MemoryContextStatsDetail(MemoryContext context,
extern void MemoryContextAllowInCriticalSection(MemoryContext context,
bool allow);
+extern PGDLLIMPORT bool MemoryContextAllowAllInCriticalSection;
+
#ifdef MEMORY_CONTEXT_CHECKING
extern void MemoryContextCheck(MemoryContext context);
#endif
diff --git a/src/test/modules/injection_points/Makefile
b/src/test/modules/injection_points/Makefile
index 2ffd2f7..4b75d27 100644
--- a/src/test/modules/injection_points/Makefile
+++ b/src/test/modules/injection_points/Makefile
@@ -10,6 +10,7 @@ REGRESS = injection_points
REGRESS_OPTS = --dlpath=$(top_builddir)/src/test/regress
ISOLATION = inplace
+TAP_TESTS = 1
# The injection points are cluster-wide, so disable installcheck
NO_INSTALLCHECK = 1
diff --git a/src/test/modules/injection_points/injection_points.c
b/src/test/modules/injection_points/injection_points.c
index 1b695a1..1b1cb56 100644
--- a/src/test/modules/injection_points/injection_points.c
+++ b/src/test/modules/injection_points/injection_points.c
@@ -238,6 +238,8 @@ injection_wait(const char *name, const void *private_data)
elog(ERROR, "could not find free slot for wait of injection
point %s ",
name);
+ elog(DEBUG1, "waiting at injection point %s", name);
+
/* And sleep.. */
ConditionVariablePrepareToSleep(&inj_state->wait_point);
for (;;)
diff --git a/src/test/modules/injection_points/meson.build
b/src/test/modules/injection_points/meson.build
index 3c23c14..86639c5 100644
--- a/src/test/modules/injection_points/meson.build
+++ b/src/test/modules/injection_points/meson.build
@@ -42,4 +42,9 @@ tests += {
'inplace',
],
},
+ 'tap': {
+ 'tests': [
+ 't/001_inplace.pl',
+ ],
+ },
}
diff --git a/src/test/modules/injection_points/t/001_inplace.pl
b/src/test/modules/injection_points/t/001_inplace.pl
new file mode 100644
index 0000000..fc9380d
--- /dev/null
+++ b/src/test/modules/injection_points/t/001_inplace.pl
@@ -0,0 +1,129 @@
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+# Test race between vac_update_datfrozenxid() pg_class scan and inplace update
+# of relfrozenxid:
+#
+# ["D" is a VACUUM (ONLY_DATABASE_STATS)]
+# ["R" is a VACUUM tbl]
+# D: vac_update_datfrozenid() -> systable_beginscan(pg_class)
+# D: systable_getnext() returns pg_class tuple of tbl
+# R: memcpy() into pg_class tuple of tbl
+# D: raise pg_database.datfrozenxid, XLogInsert(), finish
+# [crash]
+# [recovery restores datfrozenxid w/o relfrozenxid]
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+my $node = PostgreSQL::Test::Cluster->new('only');
+# We won't use a replica, but this is handy for (a) pg_waldump on the archive
+# and (b) pg_waldump showing the logged invalidations.
+$node->init(has_archiving => 1, allows_streaming => 1);
+$node->start;
+
+# Choice of table doesn't greatly matter, but injection point triggers on this
+# one. Using a catalog gives the injection point a trivial OID-based test.
+my $frozenxid_table = 'pg_enum';
+my $dat_inj = 'update_datfrozenxid-before-fetch-relfrozenxid';
+my $inplace_inj = 'inplace-after-mempcy';
+
+# session locks one table, as a way to VACUUM all database tables but this one
+my $lock_table = $node->background_psql('postgres');
+# session waits in vac_update_datfrozenxid()
+my $update_dat = $node->background_psql('postgres');
+# session waits in inplace update of vac_update_relstats()
+my $update_rel = $node->background_psql('postgres');
+# session detaches another session
+my $detacher = $node->background_psql('postgres', on_error_stop => 0);
+
+my $start_id;
+$node->psql(
+ 'postgres', qq(
+ SELECT relfrozenxid, datfrozenxid
+ FROM pg_class, pg_database
+ WHERE relname = '$frozenxid_table' AND datname = current_catalog;
+), stdout => \$start_id);
+print STDOUT "relfrozenxid, datfrozenxid: $start_id\n";
+
+# populate syscaches; query fails, but that's fine
+$detacher->query(
+ qq(
+ CREATE EXTENSION injection_points;
+ SELECT injection_points_detach('$dat_inj');
+));
+
+# marker in WAL stream, for pg_waldump convenience
+$node->safe_psql(
+ 'postgres', qq(
+ SELECT pg_logical_emit_message(false, 'before', '.', true);
+));
+
+$lock_table->query_safe(
+ "BEGIN; LOCK TABLE $frozenxid_table IN SHARE UPDATE EXCLUSIVE MODE");
+$update_rel->query_safe("SELECT txid_current()");
+# update relfrozenxid for all tables but $frozenxid_table
+$update_rel->query_safe("VACUUM (FREEZE, SKIP_LOCKED, SKIP_DATABASE_STATS);");
+# start ONLY_DATABASE_STATS and wait for it to reach injection point
+$update_dat->query_until(
+ qr/at injection point/, qq(
+ SELECT injection_points_set_local();
+ SELECT injection_points_attach('$dat_inj', 'wait');
+ SET client_min_messages = debug1;
+ VACUUM (ONLY_DATABASE_STATS);
+));
+
+# start one-table VACUUM and wait for it to reach injection point
+$lock_table->quit;
+$update_rel->query_until(
+ qr/at injection point/, qq(
+ SELECT injection_points_set_local();
+ SELECT injection_points_attach('$inplace_inj', 'wait');
+ SET client_min_messages = debug1;
+ VACUUM (FREEZE, SKIP_DATABASE_STATS) $frozenxid_table;
+));
+
+# complete ONLY_DATABASE_STATS
+$detacher->query(
+ qq(
+ SELECT injection_points_detach('$dat_inj');
+ SELECT injection_points_wakeup('$dat_inj');
+));
+$detacher->quit;
+# flush WAL, since VACUUM did an asynchronous commit
+$update_dat->query(
+ qq(
+ SELECT pg_logical_emit_message(false, 'after', '.', true);
+));
+$update_dat->quit;
+
+# crash with $inplace_inj still waiting
+$node->stop('immediate');
+# don't quit update_rel, which would suffer EPIPE
+#$update_rel->quit;
+$node->start;
+
+# check for corruption
+my $stdout;
+$node->psql(
+ 'postgres', q(
+ SELECT pg_class.oid::regclass
+ FROM pg_class, pg_database
+ WHERE age(relfrozenxid) > age(datfrozenxid)
+ AND relkind = 'r' AND datname = current_catalog;
+), stdout => \$stdout);
+is($stdout, '', 'datfrozenxid younger than every relfrozenxid');
+
+# print more detail
+$node->psql(
+ 'postgres', qq(
+ SELECT relfrozenxid, datfrozenxid
+ FROM pg_class, pg_database
+ WHERE relname = '$frozenxid_table' AND datname = current_catalog;
+), stdout => \$stdout);
+isnt($stdout, $start_id, 'frozenxid values changed');
+print STDOUT "relfrozenxid, datfrozenxid: $stdout\n";
+
+done_testing();
diff --git a/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
b/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
index 4091c31..c3f55bb 100644
--- a/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
+++ b/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
@@ -260,10 +260,10 @@ sub query_safe
=item $session->query_until(until, query)
-Issue C<query> and wait for C<until> appearing in the query output rather than
-waiting for query completion. C<query> needs to end with newline and semicolon
-(if applicable, interactive psql input may not require it) for psql to process
-the input.
+Issue C<query> and wait for C<until> appearing in the query stdout or stderr
+rather than waiting for query completion. C<query> needs to end with newline
+and semicolon (if applicable, interactive psql input may not require it) for
+psql to process the input.
=cut
@@ -276,7 +276,8 @@ sub query_until
$self->{timeout}->start() if (defined($self->{query_timer_restart}));
$self->{stdin} .= $query;
- pump_until($self->{run}, $self->{timeout}, \$self->{stdout}, $until);
+ pump_until($self->{run}, $self->{timeout},
+ [ \$self->{stdout}, \$self->{stderr} ], $until);
die "psql query timed out" if $self->{timeout}->is_expired;
diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm
b/src/test/perl/PostgreSQL/Test/Utils.pm
index 022b44b..d1791ee 100644
--- a/src/test/perl/PostgreSQL/Test/Utils.pm
+++ b/src/test/perl/PostgreSQL/Test/Utils.pm
@@ -427,29 +427,39 @@ sub run_command
=item pump_until(proc, timeout, stream, until)
-Pump until string is matched on the specified stream, or timeout occurs.
+Pump until string is matched on the specified stream(s), or timeout occurs.
+The stream argument can be a scalar ref like, \$stdout, or a ref to an array
+of scalar refs, like [\$stdout, \$stderr].
=cut
sub pump_until
{
- my ($proc, $timeout, $stream, $until) = @_;
+ my ($proc, $timeout, $stream_spec, $until) = @_;
$proc->pump_nb();
- while (1)
+ $stream_spec = [$stream_spec] if ref $stream_spec ne 'ARRAY';
+ OUTER: while (1)
{
- last if $$stream =~ /$until/;
+ foreach my $stream (@$stream_spec)
+ {
+ last OUTER if $$stream =~ /$until/;
+ }
if ($timeout->is_expired)
{
diag(
- "pump_until: timeout expired when searching for
\"$until\" with stream: \"$$stream\""
- );
+ "pump_until: timeout expired when searching for
\"$until\" with stream(s): "
+ . '"'
+ . join('", "', map { $$_ } @$stream_spec)
+ . '"');
return 0;
}
if (not $proc->pumpable())
{
diag(
- "pump_until: process terminated unexpectedly
when searching for \"$until\" with stream: \"$$stream\""
- );
+ "pump_until: process terminated unexpectedly
when searching for \"$until\" with stream(s): "
+ . '"'
+ . join('", "', map { $$_ } @$stream_spec)
+ . '"');
return 0;
}
$proc->pump();
Author: Noah Misch <[email protected]>
Commit: Noah Misch <[email protected]>
[not for commit] cherry-pick "WAL-log inplace update before" fix onto the
demo
This just resolves a merge conflict around the INJECTION_POINT() line.
diff --git a/src/backend/access/heap/README.tuplock
b/src/backend/access/heap/README.tuplock
index fb06ff2..aec8dcc 100644
--- a/src/backend/access/heap/README.tuplock
+++ b/src/backend/access/heap/README.tuplock
@@ -201,6 +201,4 @@ Inplace updates create an exception to the rule that tuple
data won't change
under a reader holding a pin. A reader of a heap_fetch() result tuple may
witness a torn read. Current inplace-updated fields are aligned and are no
wider than four bytes, and current readers don't need consistency across
-fields. Hence, they get by with just fetching each field once. XXX such a
-caller may also read a value that has not reached WAL; see
-heap_inplace_update_finish().
+fields. Hence, they get by with just fetching each field once.
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index c4d18ad..fe7489b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6305,6 +6305,8 @@ heap_inplace_update_finish(void *state, HeapTuple tuple)
Relation relation = scan->heap_rel;
uint32 oldlen;
uint32 newlen;
+ char *dst;
+ char *src;
int nmsgs = 0;
SharedInvalidationMessage *invalMessages = NULL;
bool RelcacheInitFileInval = false;
@@ -6315,6 +6317,9 @@ heap_inplace_update_finish(void *state, HeapTuple tuple)
if (oldlen != newlen || htup->t_hoff != tuple->t_data->t_hoff)
elog(ERROR, "wrong tuple length");
+ dst = (char *) htup + htup->t_hoff;
+ src = (char *) tuple->t_data + tuple->t_data->t_hoff;
+
/*
* Construct shared cache inval if necessary. Note that because we only
* pass the new version of the tuple, this mustn't be used for any
@@ -6338,16 +6343,15 @@ heap_inplace_update_finish(void *state, HeapTuple tuple)
*/
PreInplace_Inval();
- /* NO EREPORT(ERROR) from here till changes are logged */
- START_CRIT_SECTION();
-
- memcpy((char *) htup + htup->t_hoff,
- (char *) tuple->t_data + tuple->t_data->t_hoff,
- newlen);
- INJECTION_POINT("inplace-after-mempcy");
-
/*----------
- * XXX A crash here can allow datfrozenxid() to get ahead of
relfrozenxid:
+ * NO EREPORT(ERROR) from here till changes are complete
+ *
+ * Our buffer lock won't stop a reader having already pinned and checked
+ * visibility for this tuple. Hence, we write WAL first, then mutate
the
+ * buffer. Like in MarkBufferDirtyHint() or RecordTransactionCommit(),
+ * checkpoint delay makes that acceptable. With the usual order of
+ * changes, a crash after memcpy() and before XLogInsert() could allow
+ * datfrozenxid to overtake relfrozenxid:
*
* ["D" is a VACUUM (ONLY_DATABASE_STATS)]
* ["R" is a VACUUM tbl]
@@ -6357,14 +6361,28 @@ heap_inplace_update_finish(void *state, HeapTuple tuple)
* D: raise pg_database.datfrozenxid, XLogInsert(), finish
* [crash]
* [recovery restores datfrozenxid w/o relfrozenxid]
+ *
+ * Like in MarkBufferDirtyHint() subroutine XLogSaveBufferForHint(),
copy
+ * the buffer to the stack before logging. Here, that facilitates a FPI
+ * of the post-mutation block before we accept other sessions seeing it.
*/
-
- MarkBufferDirty(buffer);
+ Assert((MyProc->delayChkptFlags & DELAY_CHKPT_START) == 0);
+ START_CRIT_SECTION();
+ MyProc->delayChkptFlags |= DELAY_CHKPT_START;
/* XLOG stuff */
if (RelationNeedsWAL(relation))
{
xl_heap_inplace xlrec;
+ PGAlignedBlock copied_buffer;
+ char *origdata = (char *) BufferGetBlock(buffer);
+ Page page = BufferGetPage(buffer);
+ uint16 lower = ((PageHeader) page)->pd_lower;
+ uint16 upper = ((PageHeader) page)->pd_upper;
+ uintptr_t dst_offset_in_block;
+ RelFileLocator rlocator;
+ ForkNumber forkno;
+ BlockNumber blkno;
XLogRecPtr recptr;
xlrec.offnum = ItemPointerGetOffsetNumber(&tuple->t_self);
@@ -6379,16 +6397,29 @@ heap_inplace_update_finish(void *state, HeapTuple tuple)
XLogRegisterData((char *) invalMessages,
nmsgs *
sizeof(SharedInvalidationMessage));
- XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
- XLogRegisterBufData(0, (char *) htup + htup->t_hoff, newlen);
+ /* register block matching what buffer will look like after
changes */
+ memcpy(copied_buffer.data, origdata, lower);
+ memcpy(copied_buffer.data + upper, origdata + upper, BLCKSZ -
upper);
+ dst_offset_in_block = dst - origdata;
+ memcpy(copied_buffer.data + dst_offset_in_block, src, newlen);
+ BufferGetTag(buffer, &rlocator, &forkno, &blkno);
+ Assert(forkno == MAIN_FORKNUM);
+ XLogRegisterBlock(0, &rlocator, forkno, blkno,
copied_buffer.data,
+ REGBUF_STANDARD);
+ XLogRegisterBufData(0, src, newlen);
/* inplace updates aren't decoded atm, don't log the origin */
recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_INPLACE);
- PageSetLSN(BufferGetPage(buffer), recptr);
+ PageSetLSN(page, recptr);
}
+ memcpy(dst, src, newlen);
+ INJECTION_POINT("inplace-after-mempcy");
+
+ MarkBufferDirty(buffer);
+
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
/*
@@ -6401,6 +6432,7 @@ heap_inplace_update_finish(void *state, HeapTuple tuple)
*/
AtInplace_Inval();
+ MyProc->delayChkptFlags &= ~DELAY_CHKPT_START;
END_CRIT_SECTION();
UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
systable_endscan(scan);