Re: Conflict detection for update_deleted in logical replication

Nisha Moond Thu, 19 Dec 2024 23:12:18 -0800

Here is further performance test analysis with v16 patch-set.


In the test scenarios already shared on -hackers [1], where pgbench was run
only on the publisher node in a pub-sub setup, no performance degradation
was observed on either node.



In contrast, when pgbench was run only on the subscriber side with
detect_update_deleted=on [2], the TPS performance was reduced due to dead
tuple accumulation. This performance drop depended on the
wal_receiver_status_interval—larger intervals resulted in more dead tuple
accumulation on the subscriber node. However, after the improvement in
patch v16-0002, which dynamically tunes the status request, the default TPS
reduction was limited to only 1%.



We performed more benchmarks with the v16-patches where pgbench was run on
both the publisher and subscriber, focusing on TPS performance. To
summarize the key observations:

 - No performance impact on the publisher as dead tuple accumulation does
not occur on the publisher.

 - The performance is reduced on the subscriber side (TPS reduction (~50%)
[3] ) due to dead tuple retention for the conflict detection when
detect_update_deleted=on.

 - Performance reduction happens only on the subscriber side, as workload
on the publisher is pretty high and the apply workers must wait for the
amount of transactions with earlier timestamps to be applied and flushed
before advancing the non-removable XID to remove dead tuples.

 - To validate this further, we modified the patch to check only each
transaction's commit_time and advance the non-removable XID if the
commit_time is greater than candidate_xid_time. The benchmark results[4]
remained consistent, showing similar performance reduction. This confirms
that the performance impact on the subscriber side is a reasonable behavior
if we want to detect the update_deleted conflict reliably.



We have also tested similar scenarios in physical streaming replication, to
see the effect of enabling the hot_standby_feedback and
recovery_min_apply_delay. The benchmark results[5] showed performance
reduction in these cases as well, though impact was less compared to the
update_deleted scenario because the physical walreceiver does not need to
wait for specified WAL to be applied before sending the hot standby
feedback message. However, as the recovery_min_apply_delay increased, a
similar TPS reduction (~50%) was observed, aligning with the behavior seen
in the update_deleted case.



Based on the above, I think the performance reduction observed with the
update_deleted patch is expected and necessary because the patch's main
goal is to retain dead tuples for reliable conflict detection. Reducing
this retention period would compromise the accuracy of update_deleted
detection.



The detailed benchmark results are as follow:



[1]
https://www.postgresql.org/message-id/CABdArM5SpMyGvQTsX0-d%3Db%2BJAh0VQjuoyf9jFqcrQ3JLws5eOw%40mail.gmail.com

[2]
https://www.postgresql.org/message-id/TYAPR01MB5692B0182356F041DC9DE3B5F53E2%40TYAPR01MB5692.jpnprd01.prod.outlook.com



[3] Test with pgbench run on both publisher and subscriber.



Test setup:

- Tests performed on pgHead + v16 patches

- Created a pub-sub replication system.

- Parameters for both instances were:



   share_buffers = 30GB

   min_wal_size = 10GB

   max_wal_size = 20GB

   autovacuum = false

   track_commit_timestamp = on (only for subscriber)



-- Note: to avoid the update/delete_missing conflicts, tables to be
modified on the publisher side was renamed: pgbench_XXX -> pgbench_pub_XXX.



Test Run:

- Ran pgbench(read-write) on both the publisher and the subscriber with 30
clients for a duration of 120 seconds, collecting data over 5 runs.

- Note that pgbench was running for different tables on pub and sub.

(The scripts used for test "case1-2_measure.sh" and case1-2_setup.sh" are
attached).



Results:



Run#                   pub TPS              sub TPS

1                         32209   13704

2                         32378   13684

3                         32720   13680

4                         31483   13681

5                         31773   13813

median               32209   13684

regression          7%         -53%



Perf analysis: shows time spent on heap table scans- ```

-   68.22%     0.09%  postgres  postgres             [.] ExecModifyTable

...

            - 62.86% index_getnext_slot

               - 61.49% index_fetch_heap

                  - 61.40% heapam_index_fetch_tuple

                     + 41.43% heap_hot_search_buffer

                     + 13.11% ReleaseAndReadBuffer

                       6.15% heap_page_prune_opt

               + 1.33% index_getnext_tid

      + 4.35% ExecUpdate

```

~~~~



[4] Test with modifying the patch as mentioned above. (attached top-up
patch " 0001-wait-for-commit-time")



Test setup:

- Setup is the same as [1], only with the code modified as "
0001-wait-for-commit-time" patch.



Test Run:

- Ran pgbench(read-write) on both the publisher and the subscriber with 30
clients for a duration of 120 seconds, collecting data over 5 runs.

- Note that pgbench was running for different tables on pub and sub.

(The scripts used for test "case1-2_measure.sh" and case1-2_setup.sh" are
attached).



Results:

Run#     pub TPS              sub TPS

1            29678.49354     13440.01222

2            29686.43844     13416.32116

3            29915.70886     13412.92197

4            29986.15481     13410.23707

5            30098.37795     13289.21383

Median 29915.70886     13412.92197

Regression         3%         -53%



Perf analysis: shows time spent on heap table scans -

``` --56.35%--index_fetch_heap

              --56.04%--heapam_index_fetch_tuple

                           --38.72%--heap_hot_search_buffer

                           --11.23%--ReleaseAndReadBuffer

```



Data from pg_stat:

```

postgres=# SELECT relname, n_tup_upd, n_tup_hot_upd, n_tup_newpage_upd,
n_tup_upd - n_tup_hot_upd AS n_tup_non_hot FROM pg_stat_all_tables where
relname like 'pgbench_%';

     relname      | n_tup_upd | n_tup_hot_upd | n_tup_newpage_upd |
n_tup_non_hot

------------------+-----------+---------------+-------------------+---------------
 pgbench_history  |         0 |             0 |                 0 |
    0
 pgbench_tellers  |   1286798 |        934880 |            351918 |
 351918
 pgbench_accounts |   1286798 |         28701 |           1258097 |
1258097
 pgbench_branches |   1286797 |       1240577 |             46220 |
46220
(4 rows)

```

~~~~



[5] Test for effects of enabling the hot_standby_feedback and
recovery_min_apply_delay on a standby node.



Test setup:

- All tests done on pgHead (1f0de66ea2)

- Created one primary node with configuration -



    autovacuum = false

    shared_buffers = '30GB'

    max_wal_size = 20GB

    min_wal_size = 10GB



- Initialized the primary node with pgbench initial data using scale=100

- Created a standby node for the primary node.



Test runs:

- Ran pgbench(read-write on primary and read-only only standby) with 30
clients for a duration of 120 seconds, collecting data over 5 runs for each
case.

(The scripts used for test "case3_measure.sh" and case3_setup.sh" are
attached).



Results:

There was no regression observed on standby's TPS during these runs.

a) Data for wal_receiver_status_interval=1s/10s(default)/100s with
hot_standby_feedback=ON



  - The TPS on Primary when hot_standby_feedback=OFF



              Run#     Primary TPS

              1            31482.74898

              2            31396.02757

              3            31418.47558

              4            31485.24074

              5            31489.70499

              Median 31482.74898



  - The TPS on Primary when hot_standby_feedback=ON with varying
wal_receiver_status_interval

    (Each cell shows the TPS of the Primary node.)



              Run#     interval=1s         intervals=10s
intervals=100s

              1            31056.25701     27754.2588
17163.77088

              2            30889.89889     27627.41654
17115.88696

              3            30862.499         27579.77763
17188.18505

              4            30980.84342     27753.04054
17153.41827

              5            31105.98614     27849.44234
17225.25687

              Median 30980.84342     27753.04054              17163.77088

              Regression         -2%       -12%     -45%



b) Data for recovery_min_apply_delay =30s/60s/90s/1h with
hot_standby_feedback=ON

    (Each cell shows the TPS of the Primary node.)



              Run#     apply_del=30s   apply_del=60s
apply_del=90s   apply_del=1h

              1            17630.58618     15331.08557
13999.01297     13927.10651

              2            17650.56248     15364.88802
14039.52754     14011.92415

              3            17679.18014     15314.12559
14060.43152     13996.42864

              4            17693.89016     15296.77917
14052.06037     14037.38131

              5            17691.42358     15323.18087
14038.56499     13996.08657

              Median 17679.18014     15323.18087
14039.52754     13996.42864

              Regression         -44%     -51%
-55%                           -56%

~~~~



--

Thanks,

Nisha

0001-wait-for-commit-time.patch
Description: Binary data

#!/bin/bash

##################
### Definition ###
##################

## prefix
PUB_PREFIX="/home/hayato/pub_mypostgres/bin"

## scale factor
SCALE=100

## pgbench init command
INIT_COMMAND="pgbench -i -U postgres postgres -s $SCALE"

SOURCE=$1

################
### clean up ###
################

pg_ctl stop -D data_pub -w
pg_ctl stop -D data_sub -w
rm -rf data* *log

#######################
### setup publisher ###
#######################

initdb -D data_pub -U postgres
cat << EOF >> data_pub/postgresql.conf
port=5431
autovacuum = false
shared_buffers = '30GB'
max_wal_size = 20GB
min_wal_size = 10GB
wal_level = logical
EOF

pg_ctl -D data_pub start -w -l pub.log
${PUB_PREFIX}/$INIT_COMMAND -p 5431
psql -U postgres -p 5431 -c "CREATE PUBLICATION pub FOR ALL TABLES;"

#######################
### setup sublisher ###
#######################

initdb -D data_sub -U postgres

cat << EOF >> data_sub/postgresql.conf
port=5432
autovacuum = false
shared_buffers = '30GB'
max_wal_size = 20GB
min_wal_size = 10GB
track_commit_timestamp = on
# log_min_messages = DEBUG1
EOF

pg_ctl -D data_sub start -w -l sub.log
$INIT_COMMAND -p 5432

(
    echo "CREATE TABLE pgbench_pub_history (tid int,bid int,aid bigint,delta int,mtime timestamp,filler char(22));"
    echo "CREATE TABLE pgbench_pub_tellers (tid int not null primary key,bid int,tbalance int,filler char(84));"
    echo "CREATE TABLE pgbench_pub_accounts (aid bigint not null primary key,bid int,abalance int,filler char(84));"
    echo "CREATE TABLE pgbench_pub_branches (bid int not null primary key,bbalance int,filler char(88));"
) | psql -p 5432 -U postgres

if [ $SOURCE = "head" ]
then
    psql -U postgres -p 5432 -c "CREATE SUBSCRIPTION sub CONNECTION 'port=5431 user=postgres' PUBLICATION pub;"
else
    psql -U postgres -p 5432 -c "CREATE SUBSCRIPTION sub CONNECTION 'port=5431 user=postgres' PUBLICATION pub WITH (detect_update_deleted = on);"
fi

sleep 5s

#!/bin/bash

##################
### Definition ###
##################

## prefix
PUB_PREFIX="/home/hayato/pub_mypostgres/bin"

## Used source
SOURCE=patched

## Number of runs
NUMRUN=1

## Measurement duration
DURATION=120

## Number of clients during a run
NUMCLIENTS=30

###########################
### measure performance ###
###########################

for i in `seq ${NUMRUN}`
do
    # Prepare clean enviroment for each measurements
    ./v2_setup.sh $SOURCE

    echo "=================="
    echo "${SOURCE}_${i}.dat"
    echo "=================="

    # Do actual measurements                                                                                        
    ${PUB_PREFIX}/pgbench -p 5431 -U postgres postgres -c $NUMCLIENTS -j $NUMCLIENTS -T $DURATION > pub_${SOURCE}_${i}.dat &
    pgbench -p 5432 -U postgres postgres -c $NUMCLIENTS -j $NUMCLIENTS -T $DURATION > sub_${SOURCE}_${i}.dat
done

#!/bin/bash

##################
### Definition ###
##################
port_primary=5533
port_standby=5534

## scale factor
SCALE=100

## pgbench init command
INIT_COMMAND="pgbench -i -U postgres postgres -s $SCALE"

################
### clean up ###
################

./pg_ctl stop -D data_primary -w
./pg_ctl stop -D data_standby -w
rm -rf data* *log

#######################
### setup publisher ###
#######################

./initdb -D data_primary -U postgres

cat << EOF >> data_primary/postgresql.conf
port=$port_primary
autovacuum = false
shared_buffers = '30GB'
max_wal_size = 20GB
min_wal_size = 10GB
EOF

./pg_ctl -D data_primary start -w -l primary.log
./$INIT_COMMAND -p $port_primary
./psql -p $port_primary -d postgres -U postgres -c "SELECT pg_create_physical_replication_slot('standby_1');"
./psql -p $port_primary -d postgres -U postgres -c "CREATE ROLE replication WITH REPLICATION PASSWORD 'password' LOGIN;"

./pg_ctl -D data_primary restart -l primary.log

#######################
### setup standby ###
#######################

./pg_basebackup -h 127.0.0.1 -D data_standby  -R -P -U replication -X s  -p $port_primary

cat << EOF >> data_standby/postgresql.conf
port = $port_standby
primary_slot_name = 'standby_1'
hot_standby_feedback = on
wal_receiver_status_interval=100s
EOF


./pg_ctl -D data_standby start -w -l standby.log


sleep 5s

#!/bin/bash

##################
### Definition ###
##################

port_primary=5533
port_standby=5534


## Used source
SOURCE=hsfOFF_ad1hr

## Number of runs
NUMRUN=5

## Measurement duration
DURATION=120

## Number of clients during a run
NUMCLIENTS=30

###########################
### measure performance ###
###########################

for i in `seq ${NUMRUN}`
do
    # Prepare clean enviroment for each measurements
    ./v2_setup_n.sh

    echo "=================="
    echo "${SOURCE}_${i}.dat"
    echo "=================="

    # Do actual measurements                                                                                        
    ./pgbench -p $port_primary -U postgres postgres -c $NUMCLIENTS -j $NUMCLIENTS -T $DURATION > primary_${SOURCE}_${i}.dat &
    ./pgbench -p $port_standby -U postgres postgres -c $NUMCLIENTS -j $NUMCLIENTS -T $DURATION -b select-only --no-vacuum > standby_${SOURCE}_${i}.dat

done

Re: Conflict detection for update_deleted in logical replication

Reply via email to