Hello,
01.02.2024 21:20, vignesh C wrote:
The patch which you submitted has been awaiting your attention for
quite some time now. As such, we have moved it to "Returned with
Feedback" and removed it from the reviewing queue. Depending on
timing, this may be reversible. Kindly address the feedback you have
received, and resubmit the patch to the next CommitFest.
While analyzing buildfarm failures, I found [1], which demonstrates the
assertion failure discussed here:
---
031_column_list_publisher.log
TRAP: FailedAssertion("TransactionIdPrecedesOrEquals(safeXid, snap->xmin)", File:
"/home/bf/bf-build/skink/REL_15_STABLE/pgsql.build/../pgsql/src/backend/replication/logical/snapbuild.c", Line: 614,
PID: 1882382)
---
I've managed to reproduce the assertion failure on REL_15_STABLE with the
following modification:
@@ -3928,6 +3928,7 @@ ProcArraySetReplicationSlotXmin(TransactionId xmin,
TransactionId catalog_xmin,
{
Assert(!already_locked || LWLockHeldByMe(ProcArrayLock));
+pg_usleep(1000);
if (!already_locked)
LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
using the script:
numjobs=100
createdb db
export PGDATABASE=db
for ((i=1;i<=100;i++)); do
echo "iteration $i"
for ((j=1;j<=numjobs;j++)); do
echo "
SELECT pg_create_logical_replication_slot('s$j', 'test_decoding');
SELECT txid_current();
" | psql >>/dev/null 2>&1 &
echo "
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
CREATE_REPLICATION_SLOT slot$j LOGICAL test_decoding USE_SNAPSHOT;
" | psql -d "dbname=db replication=database" >>/dev/null 2>&1 &
done
wait
for ((j=1;j<=numjobs;j++)); do
echo "
DROP_REPLICATION_SLOT slot$j;
" | psql -d "dbname=db replication=database" >/dev/null
echo "SELECT pg_drop_replication_slot('s$j');" | psql >/dev/null
done
grep 'TRAP' server.log && break;
done
(with
wal_level = logical
max_replication_slots = 200
max_wal_senders = 200
in postgresql.conf)
iteration 18
ERROR: replication slot "slot13" is active for PID 538431
TRAP: FailedAssertion("TransactionIdPrecedesOrEquals(safeXid, snap->xmin)", File:
"snapbuild.c", Line: 614, PID: 538431)
I've also confirmed that fix_concurrent_slot_xmin_update.patch fixes the
issue.
[1]
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2024-05-15%2020%3A55%3A17
Best regards,
Alexander