On Sat, 19 Aug 2023 at 11:53, Amit Kapila <amit.kapil...@gmail.com> wrote:
>
> It's entirely possible for a logical slot to have a confirmed_flush
> LSN higher than the last value saved on disk while not being marked as
> dirty.  It's currently not a problem to lose that value during a clean
> shutdown / restart cycle but to support the upgrade of logical slots
> [1] (see latest patch at [2]), we seem to rely on that value being
> properly persisted to disk. During the upgrade, we need to verify that
> all the data prior to shudown_checkpoint for the logical slots has
> been consumed, otherwise, the downstream may miss some data. Now, to
> ensure the same, we are planning to compare the confirm_flush LSN
> location with the latest shudown_checkpoint location which means that
> the confirm_flush LSN should be updated after restart.
>
> I think this is inefficient even without an upgrade because, after the
> restart, this may lead to decoding some data again. Say, we process
> some transactions for which we didn't send anything downstream (the
> changes got filtered) but the confirm_flush LSN is updated due to
> keepalives. As we don't flush the latest value of confirm_flush LSN,
> it may lead to processing the same changes again.

I was able to test and verify that we were not processing the same
changes again.
Note: The 0001-Add-logs-to-skip-transaction-filter-insert-operation.patch
has logs to print if a decode transaction is skipped and also a log to
mention if any operation is filtered.
The test.sh script has the steps for a) setting up logical replication
for a table b) perform insert on table that need to be published (this
will be replicated to the subscriber) c) perform insert on a table
that will not be published (this insert will be filtered, it will not
be replicated) d) sleep for 5 seconds e) stop the server f) start the
server
I used the following steps, do the following in HEAD:
a) Apply 0001-Add-logs-to-skip-transaction-filter-insert-operation.patch
patch in Head and build the binaries b) execute test.sh c) view N1.log
file to see that the insert operations were filtered again by seeing
the following logs:
LOG:  Filter insert for table tbl2
...
===restart===
...
LOG:  Skipping transaction 0/156AD10 as start decode at is greater 0/156AE40
...
LOG:  Filter insert for table tbl2

We can see that the insert operations on tbl2 which was filtered
before server was stopped is again filtered after restart too in HEAD.

Lets see that the same changes were not processed again with patch:
a) Apply v4-0001-Persist-to-disk-logical-slots-during-a-shutdown-c.patch
from [1] also apply
0001-Add-logs-to-skip-transaction-filter-insert-operation.patch patch
and build the binaries b) execute test.sh c) view N1.log file to see
that the insert operations were skipped after restart of server by
seeing the following logs:
LOG:  Filter insert for table tbl2
...
===restart===
...
Skipping transaction 0/156AD10 as start decode at is greater 0/156AFB0
...
Skipping transaction 0/156AE80 as start decode at is greater 0/156AFB0

We can see that the insert operations on tbl2 are not processed again
after restart with the patch.

[1] - 
https://www.postgresql.org/message-id/CALDaNm0VrAt24e2FxbOX6eJQ-G_tZ0gVpsFBjzQM99NxG0hZfg%40mail.gmail.com

Regards,
Vignesh
#!/bin/bash

port_publisher=5431
port_subscriber=5432
bindir=/home/vignesh/postgres/inst/bin

echo '##########'
echo '#Clean up#'
echo '##########'

pg_ctl stop -D data_N2
pg_ctl stop -D data_N1

rm -r data_N1 data_N2 *log

echo '########'
echo '#Set up#'
echo '########'

initdb -D data_N1 -U postgres
initdb -D data_N2 -U postgres

cat << EOF >> data_N1/postgresql.conf
wal_level = logical
port = $port_publisher
wal_sender_timeout = 5s
#log_min_messages = debug3
EOF

cat << EOF >> data_N2/postgresql.conf
wal_level = logical
port = $port_subscriber
EOF

# Boot database instances
pg_ctl -D data_N1 start -w -l N1.log
pg_ctl -D data_N2 start -w -l N2.log

# Setup as publisher/subscriber
psql -U postgres -p $port_publisher -c "CREATE TABLE tbl (a int, b int);"
psql -U postgres -p $port_publisher -c "CREATE TABLE tbl2 (a int, b int);"
psql -U postgres -p $port_publisher -c "CREATE EXTENSION pg_walinspect"
psql -U postgres -p $port_publisher -c "CREATE PUBLICATION pub FOR TABLE tbl;"

psql -U postgres -p $port_subscriber -c "CREATE TABLE tbl (a int, b int);"
psql -U postgres -p $port_subscriber -c "CREATE TABLE tbl2 (a int, b int);"
psql -U postgres -p $port_subscriber -c "CREATE SUBSCRIPTION sub CONNECTION 'user=postgres dbname=postgres port=$port_publisher' PUBLICATION pub WITH (copy_data = off)"

# do INSERT on publisher
psql -U postgres -p $port_publisher -c "INSERT INTO tbl VALUES (generate_series(1, 5))"
psql -U postgres -p $port_publisher -c "INSERT INTO tbl2 VALUES (generate_series(1, 5))"

# Wait short time to make sure subscriber is caught up
sleep 5s

psql -U postgres -p $port_subscriber -c "SELECT COUNT(*) FROM tbl;"

# Stop both nodes and reboot to emulate pg_upgrade
pg_ctl stop -D data_N2
pg_ctl stop -D data_N1

pg_ctl -D data_N1 start -w -l N1.log -o "-b"

echo '###############################################################'
echo '#Check difference of WAL position between publisher/subscriber#'
echo '###############################################################'

psql -U postgres -p $port_publisher -c "
WITH tmp as (
    SELECT confirmed_flush_lsn FROM pg_replication_slots
)
SELECT row_number() over (), start_lsn, end_lsn, prev_lsn, xid, resource_manager, record_type,
       record_length, main_data_length, fpi_length, description, block_ref
       FROM tmp, pg_get_wal_records_info(tmp.confirmed_flush_lsn, 'FFFFFFFF/FFFFFFFF');
"

pg_ctl -D data_N2 start -w -l N2.log -o "-b"

Attachment: 0001-Add-logs-to-skip-transaction-filter-insert-operation.patch
Description: Binary data

Reply via email to