On Sat, 19 Aug 2023 at 11:53, Amit Kapila <amit.kapil...@gmail.com> wrote: > > It's entirely possible for a logical slot to have a confirmed_flush > LSN higher than the last value saved on disk while not being marked as > dirty. It's currently not a problem to lose that value during a clean > shutdown / restart cycle but to support the upgrade of logical slots > [1] (see latest patch at [2]), we seem to rely on that value being > properly persisted to disk. During the upgrade, we need to verify that > all the data prior to shudown_checkpoint for the logical slots has > been consumed, otherwise, the downstream may miss some data. Now, to > ensure the same, we are planning to compare the confirm_flush LSN > location with the latest shudown_checkpoint location which means that > the confirm_flush LSN should be updated after restart. > > I think this is inefficient even without an upgrade because, after the > restart, this may lead to decoding some data again. Say, we process > some transactions for which we didn't send anything downstream (the > changes got filtered) but the confirm_flush LSN is updated due to > keepalives. As we don't flush the latest value of confirm_flush LSN, > it may lead to processing the same changes again.
I was able to test and verify that we were not processing the same changes again. Note: The 0001-Add-logs-to-skip-transaction-filter-insert-operation.patch has logs to print if a decode transaction is skipped and also a log to mention if any operation is filtered. The test.sh script has the steps for a) setting up logical replication for a table b) perform insert on table that need to be published (this will be replicated to the subscriber) c) perform insert on a table that will not be published (this insert will be filtered, it will not be replicated) d) sleep for 5 seconds e) stop the server f) start the server I used the following steps, do the following in HEAD: a) Apply 0001-Add-logs-to-skip-transaction-filter-insert-operation.patch patch in Head and build the binaries b) execute test.sh c) view N1.log file to see that the insert operations were filtered again by seeing the following logs: LOG: Filter insert for table tbl2 ... ===restart=== ... LOG: Skipping transaction 0/156AD10 as start decode at is greater 0/156AE40 ... LOG: Filter insert for table tbl2 We can see that the insert operations on tbl2 which was filtered before server was stopped is again filtered after restart too in HEAD. Lets see that the same changes were not processed again with patch: a) Apply v4-0001-Persist-to-disk-logical-slots-during-a-shutdown-c.patch from [1] also apply 0001-Add-logs-to-skip-transaction-filter-insert-operation.patch patch and build the binaries b) execute test.sh c) view N1.log file to see that the insert operations were skipped after restart of server by seeing the following logs: LOG: Filter insert for table tbl2 ... ===restart=== ... Skipping transaction 0/156AD10 as start decode at is greater 0/156AFB0 ... Skipping transaction 0/156AE80 as start decode at is greater 0/156AFB0 We can see that the insert operations on tbl2 are not processed again after restart with the patch. [1] - https://www.postgresql.org/message-id/CALDaNm0VrAt24e2FxbOX6eJQ-G_tZ0gVpsFBjzQM99NxG0hZfg%40mail.gmail.com Regards, Vignesh
#!/bin/bash port_publisher=5431 port_subscriber=5432 bindir=/home/vignesh/postgres/inst/bin echo '##########' echo '#Clean up#' echo '##########' pg_ctl stop -D data_N2 pg_ctl stop -D data_N1 rm -r data_N1 data_N2 *log echo '########' echo '#Set up#' echo '########' initdb -D data_N1 -U postgres initdb -D data_N2 -U postgres cat << EOF >> data_N1/postgresql.conf wal_level = logical port = $port_publisher wal_sender_timeout = 5s #log_min_messages = debug3 EOF cat << EOF >> data_N2/postgresql.conf wal_level = logical port = $port_subscriber EOF # Boot database instances pg_ctl -D data_N1 start -w -l N1.log pg_ctl -D data_N2 start -w -l N2.log # Setup as publisher/subscriber psql -U postgres -p $port_publisher -c "CREATE TABLE tbl (a int, b int);" psql -U postgres -p $port_publisher -c "CREATE TABLE tbl2 (a int, b int);" psql -U postgres -p $port_publisher -c "CREATE EXTENSION pg_walinspect" psql -U postgres -p $port_publisher -c "CREATE PUBLICATION pub FOR TABLE tbl;" psql -U postgres -p $port_subscriber -c "CREATE TABLE tbl (a int, b int);" psql -U postgres -p $port_subscriber -c "CREATE TABLE tbl2 (a int, b int);" psql -U postgres -p $port_subscriber -c "CREATE SUBSCRIPTION sub CONNECTION 'user=postgres dbname=postgres port=$port_publisher' PUBLICATION pub WITH (copy_data = off)" # do INSERT on publisher psql -U postgres -p $port_publisher -c "INSERT INTO tbl VALUES (generate_series(1, 5))" psql -U postgres -p $port_publisher -c "INSERT INTO tbl2 VALUES (generate_series(1, 5))" # Wait short time to make sure subscriber is caught up sleep 5s psql -U postgres -p $port_subscriber -c "SELECT COUNT(*) FROM tbl;" # Stop both nodes and reboot to emulate pg_upgrade pg_ctl stop -D data_N2 pg_ctl stop -D data_N1 pg_ctl -D data_N1 start -w -l N1.log -o "-b" echo '###############################################################' echo '#Check difference of WAL position between publisher/subscriber#' echo '###############################################################' psql -U postgres -p $port_publisher -c " WITH tmp as ( SELECT confirmed_flush_lsn FROM pg_replication_slots ) SELECT row_number() over (), start_lsn, end_lsn, prev_lsn, xid, resource_manager, record_type, record_length, main_data_length, fpi_length, description, block_ref FROM tmp, pg_get_wal_records_info(tmp.confirmed_flush_lsn, 'FFFFFFFF/FFFFFFFF'); " pg_ctl -D data_N2 start -w -l N2.log -o "-b"
0001-Add-logs-to-skip-transaction-filter-insert-operation.patch
Description: Binary data