Hi,

When finishing the prepared transaction, we are marking it as "not valid" after
the WAL record has been written. This allows us to be sure that even if we fail
later, no one will be able to commit/rollback this transaction again. Here is
the code that implements it :

```
/*
 * In case we fail while running the callbacks, mark the gxact invalid so
 * no one else will try to commit/rollback, and so it will be recycled if
 * we fail after this point.  It is still locked by our backend so it
 * won't go away yet.
 *
 * (We assume it's safe to do this without taking TwoPhaseStateLock.)
 */
gxact->valid = false;
```

Comment says that we can fail while running the callbacks. But if this really
happens, then we are in trouble :
1) Twophase transaction we are working on is invalid, so AtAbort_Twophase will
simply delete it from shmem.
2) Not all on-commit/on-rollback callbacks are completed, so (for example)
transaction locks may remain.

Thus, if the COMMIT/ROLLBACK PREPARED command faces an ERROR, the user 1) will
not see the transaction in pg_prepared_xacts, but 2) will see its locks in
pg_locks. IIUC, the only way to release these locks is to restart the server.

I don't see any discussion about it on hackers, and I also don't see any sane
fix for this problem. So, I decided to share it with you. Looking forward to
your comments.

Please, see the attached test that demonstrates the problem. Actually,
reproduction is difficult. The only way to get an ERROR during callbacks
execution is to trigger OOM. Since it is hard to implement from perl tests, I
have added an injection point.

(The patch can be applied to the newest master).

--
Best regards,
Daniil Davydov
From 2cbbe8ccde03b46971b81e2f3f0affa99bded0d1 Mon Sep 17 00:00:00 2001
From: Daniil Davidov <[email protected]>
Date: Sat, 23 May 2026 20:27:30 +0700
Subject: [PATCH] Reproduce error with twophase commit

---
 src/backend/access/transam/twophase.c         |  2 +
 src/test/recovery/meson.build                 |  1 +
 .../recovery/t/053_twophase_commit_error.pl   | 76 +++++++++++++++++++
 3 files changed, 79 insertions(+)
 create mode 100644 src/test/recovery/t/053_twophase_commit_error.pl

diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 1035e8b3fc7..07a628e2a78 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -1652,6 +1652,8 @@ FinishPreparedTransaction(const char *gid, bool isCommit)
 	 */
 	LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
 
+	INJECTION_POINT("twophase-callbacks-execution", NULL);
+
 	/* And now do the callbacks */
 	if (isCommit)
 		ProcessRecords(bufptr, fxid, twophase_postcommit_callbacks);
diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index 36d789720a3..6305ce88b67 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -61,6 +61,7 @@ tests += {
       't/050_redo_segment_missing.pl',
       't/051_effective_wal_level.pl',
       't/052_checkpoint_segment_missing.pl',
+      't/053_twophase_commit_error.pl',
     ],
   },
 }
diff --git a/src/test/recovery/t/053_twophase_commit_error.pl b/src/test/recovery/t/053_twophase_commit_error.pl
new file mode 100644
index 00000000000..96006c6dd2d
--- /dev/null
+++ b/src/test/recovery/t/053_twophase_commit_error.pl
@@ -0,0 +1,76 @@
+# Copyright (c) 2026, PostgreSQL Global Development Group
+
+# Test failure when COMMIT PREPARED gets an ERROR after its commit WAL record
+# has been emitted, but before two-phase cleanup callbacks have released locks.
+
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+	plan skip_all => 'Injection points not supported by this build';
+}
+
+my $node = PostgreSQL::Test::Cluster->new('node');
+$node->init;
+$node->append_conf('postgresql.conf', 'max_prepared_transactions = 5');
+$node->start;
+
+if (!$node->check_extension('injection_points'))
+{
+	plan skip_all => 'Extension injection_points not installed';
+}
+
+$node->safe_psql('postgres', 'CREATE EXTENSION injection_points');
+$node->safe_psql(
+	'postgres', q{
+	SELECT injection_points_attach('twophase-callbacks-execution', 'error')
+});
+
+$node->safe_psql(
+	'postgres', q{
+	CREATE TABLE twophase_lock_target(id int);
+	CREATE TABLE twophase_commit_marker(id int);
+
+	BEGIN;
+	LOCK TABLE twophase_lock_target IN ACCESS EXCLUSIVE MODE;
+	INSERT INTO twophase_commit_marker VALUES (1);
+	PREPARE TRANSACTION 'prepared';
+});
+
+my $table_oid = $node->safe_psql(
+	'postgres', q{
+	SELECT oid FROM pg_class WHERE relname = 'twophase_lock_target';
+});
+
+
+my ($ret, $stdout, $stderr) =
+  $node->psql('postgres', q{COMMIT PREPARED 'prepared'});
+
+isnt($ret, 0, 'COMMIT PREPARED failed at injectcion point');
+
+$ret = $node->safe_psql(
+	'postgres', q{
+	SELECT count(*) FROM pg_prepared_xacts;
+});
+is($ret, 0, 'we cannot see transaction in the catalog');
+
+$ret = $node->safe_psql(
+	'postgres', q{
+	SELECT * FROM twophase_commit_marker;
+});
+is($ret, 1, 'transaction has been commited');
+
+
+$ret = $node->safe_psql(
+	'postgres', qq{
+	SELECT COUNT(*) FROM pg_locks WHERE locktype = 'relation' AND relation = $table_oid;
+});
+is($ret, 1, 'orphaned lock detected');
+
+$node->stop;
+done_testing();
-- 
2.43.0

Reply via email to