Hi all, (Added Amit K. in CC.) Some colleagues have reported that it is possible to finish with orphaned records in pg_replication_origin_status, as an effect of table synchronization workers that miss some cleanup actions around replorigin_drop_by_name() when a subscription that spawned these workers (which themselves create origins through replorigin_create() called in LogicalRepSyncTableStart()) is dropped.
Please find attached a small script, courtesy of Tenglong Gu and Daisuke Higuchi, that have been digging into all that. This problem is true since ce0fdbfe9722 that has added the creation of a replication origin for table synchronization workers, as of 14 up to HEAD. One issue with these origin states is that they are sticky: a restart of the node that held the subscription keeps them around due to them getting flushed in replorigin_checkpoint. If my understanding is right, this also means that origin slots are still tracked as in use, implying that replorigin_session_setup() would not be able to reuse them as far as I understand. So, in summary (please correct me if necessary), DropSubscription() attempts a round of cleanup for all the replication origins with a loop over ReplicationOriginNameForLogicalRep(), and while in this case we attempt to drop the origins created by the workers that are tracked as orphaned, the syscache lookup of REPLORIGIDENT fails within the DROP SUBSCRIPTION because replorigin_by_name() cannot find an entry: the replication origin is already gone, but its state has persisted in memory. Then, why would the replication origin be already gone with its state in memory not cleaned up yet? Well, the problematic test case shows that the subscription is dropped while the spawned tablesync workers do the initial table copy, and the replication origin is created in the same transaction as the COPY. DROP SUBSCRIPTION tells the tablesync workers to stop, they stop with the COPY failing and the origin is not seen as something that exists for the session that drops the subscription. The replication state in memory goes out of sync. So it looks like to me that we are missing a code path where the replication origin is dropped but we just ignore to reset the replication state in memory, leading to bloat of the origin slots available. worker.c does things differently: a small transaction is used for the origin creation, hence we would never really see the replication state memory going out-of-sync with the replorigin catalog. One solution would be for tablesync.c to do the same as worker.c: we could use a short transaction that sets the memory state and creates the replication origin, then move to a second transaction for the COPY. A second solution, slightly more complicated, is to create some specific logic to reset the progress state of the origin that's been created in the transaction that runs the initial COPY, which I guess should be in the shape of a transaction abort callback. All these symptoms are pointing out that it is not a good idea, IMO, to expect a replication origin to exist when dropping a subscription when an origin has been created in the context of a transaction that can take a very long time to run, aka the initial tablesync's COPY, so using a short transaction feels like a good thing to do here. I have to admit that this code has become much more complicated over the last couple of years, hence I may not have the full context of how these cleanups actions should be achieved. They cleary should happen though, but I am not completely sure which solution would be better over the other. There may be an entirely different solution, as well. Note that it would not be complicated to create a regression test based on an injection point: waiting during the COPY of a tablesync worker would be enough while issuing a DROP SUBSCRIPTION. So, thoughts or comments? -- Michael
#!/bin/bash
# Create two clusters with wal_level=logical, running at two different
# ports.
PGPORT_PUB=5432
PGPORT_SUB=5433
PGHOST=/tmp
PGROLE=postgres
PGDATABASE=postgres
# Make this number just large enough for COPY to last while DROP
# SUBSCRIPTION runs.
NUM_RECORDS=500000
# The key point is to drop subscription during the table sync worker for
# initial synchronization is running. So, we might need to sleep before
# dropping subscription. The tablesync COPY fails, causing the DROP
# SUBSCRIPTION to miss progress cleanup which is in shared memory.
psql -p $PGPORT_PUB -c"CREATE TABLE tbl(id INTEGER PRIMARY KEY)"
psql -p $PGPORT_PUB -c"insert into tbl values
(generate_series(1,${NUM_RECORDS}))"
psql -p $PGPORT_PUB -c"CREATE TABLE test_01(id INTEGER PRIMARY KEY)"
psql -p $PGPORT_PUB -c"CREATE TABLE test_02(id INTEGER PRIMARY KEY)"
psql -p $PGPORT_PUB -c"CREATE TABLE test_03(id INTEGER PRIMARY KEY)"
psql -p $PGPORT_PUB -c"CREATE TABLE test_04(id INTEGER PRIMARY KEY)"
psql -p $PGPORT_PUB -c"CREATE TABLE test_05(id INTEGER PRIMARY KEY)"
psql -p $PGPORT_PUB -c"insert into test_01 select * from tbl;"
psql -p $PGPORT_PUB -c"insert into test_02 select * from tbl;"
psql -p $PGPORT_PUB -c"insert into test_03 select * from tbl;"
psql -p $PGPORT_PUB -c"insert into test_04 select * from tbl;"
psql -p $PGPORT_PUB -c"insert into test_05 select * from tbl;"
psql -p $PGPORT_PUB -c"CREATE PUBLICATION pub FOR ALL TABLES;"
psql -p $PGPORT_PUB -c"SELECT * FROM pg_publication;"
psql -p $PGPORT_SUB -c"CREATE TABLE tbl(id INTEGER PRIMARY KEY)"
psql -p $PGPORT_SUB -c"CREATE TABLE test_01(id INTEGER PRIMARY KEY)"
psql -p $PGPORT_SUB -c"CREATE TABLE test_02(id INTEGER PRIMARY KEY)"
psql -p $PGPORT_SUB -c"CREATE TABLE test_03(id INTEGER PRIMARY KEY)"
psql -p $PGPORT_SUB -c"CREATE TABLE test_04(id INTEGER PRIMARY KEY)"
psql -p $PGPORT_SUB -c"CREATE TABLE test_05(id INTEGER PRIMARY KEY)"
psql -p $PGPORT_SUB -c"CREATE SUBSCRIPTION sub CONNECTION 'host=${PGHOST}
port=${PGPORT_PUB} user=${PGROLE} dbname=${PGDATABASE}' PUBLICATION pub;"
psql -p $PGPORT_SUB -c"SELECT * FROM pg_subscription;"
sleep 2
psql -p $PGPORT_SUB -c"select * from pg_replication_origin"
psql -p $PGPORT_SUB -c"select * from pg_replication_origin_status"
psql -p $PGPORT_SUB -c"drop subscription sub ;"
psql -p $PGPORT_SUB -c"select version()"
psql -p $PGPORT_SUB -c"select * from pg_replication_origin"
psql -p $PGPORT_SUB -c"select * from pg_replication_origin_status"
signature.asc
Description: PGP signature
