Here is further performance test analysis with v16 patch-set.
In the test scenarios already shared on -hackers [1], where pgbench was run only on the publisher node in a pub-sub setup, no performance degradation was observed on either node. In contrast, when pgbench was run only on the subscriber side with detect_update_deleted=on [2], the TPS performance was reduced due to dead tuple accumulation. This performance drop depended on the wal_receiver_status_interval—larger intervals resulted in more dead tuple accumulation on the subscriber node. However, after the improvement in patch v16-0002, which dynamically tunes the status request, the default TPS reduction was limited to only 1%. We performed more benchmarks with the v16-patches where pgbench was run on both the publisher and subscriber, focusing on TPS performance. To summarize the key observations: - No performance impact on the publisher as dead tuple accumulation does not occur on the publisher. - The performance is reduced on the subscriber side (TPS reduction (~50%) [3] ) due to dead tuple retention for the conflict detection when detect_update_deleted=on. - Performance reduction happens only on the subscriber side, as workload on the publisher is pretty high and the apply workers must wait for the amount of transactions with earlier timestamps to be applied and flushed before advancing the non-removable XID to remove dead tuples. - To validate this further, we modified the patch to check only each transaction's commit_time and advance the non-removable XID if the commit_time is greater than candidate_xid_time. The benchmark results[4] remained consistent, showing similar performance reduction. This confirms that the performance impact on the subscriber side is a reasonable behavior if we want to detect the update_deleted conflict reliably. We have also tested similar scenarios in physical streaming replication, to see the effect of enabling the hot_standby_feedback and recovery_min_apply_delay. The benchmark results[5] showed performance reduction in these cases as well, though impact was less compared to the update_deleted scenario because the physical walreceiver does not need to wait for specified WAL to be applied before sending the hot standby feedback message. However, as the recovery_min_apply_delay increased, a similar TPS reduction (~50%) was observed, aligning with the behavior seen in the update_deleted case. Based on the above, I think the performance reduction observed with the update_deleted patch is expected and necessary because the patch's main goal is to retain dead tuples for reliable conflict detection. Reducing this retention period would compromise the accuracy of update_deleted detection. The detailed benchmark results are as follow: [1] https://www.postgresql.org/message-id/CABdArM5SpMyGvQTsX0-d%3Db%2BJAh0VQjuoyf9jFqcrQ3JLws5eOw%40mail.gmail.com [2] https://www.postgresql.org/message-id/TYAPR01MB5692B0182356F041DC9DE3B5F53E2%40TYAPR01MB5692.jpnprd01.prod.outlook.com [3] Test with pgbench run on both publisher and subscriber. Test setup: - Tests performed on pgHead + v16 patches - Created a pub-sub replication system. - Parameters for both instances were: share_buffers = 30GB min_wal_size = 10GB max_wal_size = 20GB autovacuum = false track_commit_timestamp = on (only for subscriber) -- Note: to avoid the update/delete_missing conflicts, tables to be modified on the publisher side was renamed: pgbench_XXX -> pgbench_pub_XXX. Test Run: - Ran pgbench(read-write) on both the publisher and the subscriber with 30 clients for a duration of 120 seconds, collecting data over 5 runs. - Note that pgbench was running for different tables on pub and sub. (The scripts used for test "case1-2_measure.sh" and case1-2_setup.sh" are attached). Results: Run# pub TPS sub TPS 1 32209 13704 2 32378 13684 3 32720 13680 4 31483 13681 5 31773 13813 median 32209 13684 regression 7% -53% Perf analysis: shows time spent on heap table scans- ``` - 68.22% 0.09% postgres postgres [.] ExecModifyTable ... - 62.86% index_getnext_slot - 61.49% index_fetch_heap - 61.40% heapam_index_fetch_tuple + 41.43% heap_hot_search_buffer + 13.11% ReleaseAndReadBuffer 6.15% heap_page_prune_opt + 1.33% index_getnext_tid + 4.35% ExecUpdate ``` ~~~~ [4] Test with modifying the patch as mentioned above. (attached top-up patch " 0001-wait-for-commit-time") Test setup: - Setup is the same as [1], only with the code modified as " 0001-wait-for-commit-time" patch. Test Run: - Ran pgbench(read-write) on both the publisher and the subscriber with 30 clients for a duration of 120 seconds, collecting data over 5 runs. - Note that pgbench was running for different tables on pub and sub. (The scripts used for test "case1-2_measure.sh" and case1-2_setup.sh" are attached). Results: Run# pub TPS sub TPS 1 29678.49354 13440.01222 2 29686.43844 13416.32116 3 29915.70886 13412.92197 4 29986.15481 13410.23707 5 30098.37795 13289.21383 Median 29915.70886 13412.92197 Regression 3% -53% Perf analysis: shows time spent on heap table scans - ``` --56.35%--index_fetch_heap --56.04%--heapam_index_fetch_tuple --38.72%--heap_hot_search_buffer --11.23%--ReleaseAndReadBuffer ``` Data from pg_stat: ``` postgres=# SELECT relname, n_tup_upd, n_tup_hot_upd, n_tup_newpage_upd, n_tup_upd - n_tup_hot_upd AS n_tup_non_hot FROM pg_stat_all_tables where relname like 'pgbench_%'; relname | n_tup_upd | n_tup_hot_upd | n_tup_newpage_upd | n_tup_non_hot ------------------+-----------+---------------+-------------------+--------------- pgbench_history | 0 | 0 | 0 | 0 pgbench_tellers | 1286798 | 934880 | 351918 | 351918 pgbench_accounts | 1286798 | 28701 | 1258097 | 1258097 pgbench_branches | 1286797 | 1240577 | 46220 | 46220 (4 rows) ``` ~~~~ [5] Test for effects of enabling the hot_standby_feedback and recovery_min_apply_delay on a standby node. Test setup: - All tests done on pgHead (1f0de66ea2) - Created one primary node with configuration - autovacuum = false shared_buffers = '30GB' max_wal_size = 20GB min_wal_size = 10GB - Initialized the primary node with pgbench initial data using scale=100 - Created a standby node for the primary node. Test runs: - Ran pgbench(read-write on primary and read-only only standby) with 30 clients for a duration of 120 seconds, collecting data over 5 runs for each case. (The scripts used for test "case3_measure.sh" and case3_setup.sh" are attached). Results: There was no regression observed on standby's TPS during these runs. a) Data for wal_receiver_status_interval=1s/10s(default)/100s with hot_standby_feedback=ON - The TPS on Primary when hot_standby_feedback=OFF Run# Primary TPS 1 31482.74898 2 31396.02757 3 31418.47558 4 31485.24074 5 31489.70499 Median 31482.74898 - The TPS on Primary when hot_standby_feedback=ON with varying wal_receiver_status_interval (Each cell shows the TPS of the Primary node.) Run# interval=1s intervals=10s intervals=100s 1 31056.25701 27754.2588 17163.77088 2 30889.89889 27627.41654 17115.88696 3 30862.499 27579.77763 17188.18505 4 30980.84342 27753.04054 17153.41827 5 31105.98614 27849.44234 17225.25687 Median 30980.84342 27753.04054 17163.77088 Regression -2% -12% -45% b) Data for recovery_min_apply_delay =30s/60s/90s/1h with hot_standby_feedback=ON (Each cell shows the TPS of the Primary node.) Run# apply_del=30s apply_del=60s apply_del=90s apply_del=1h 1 17630.58618 15331.08557 13999.01297 13927.10651 2 17650.56248 15364.88802 14039.52754 14011.92415 3 17679.18014 15314.12559 14060.43152 13996.42864 4 17693.89016 15296.77917 14052.06037 14037.38131 5 17691.42358 15323.18087 14038.56499 13996.08657 Median 17679.18014 15323.18087 14039.52754 13996.42864 Regression -44% -51% -55% -56% ~~~~ -- Thanks, Nisha
0001-wait-for-commit-time.patch
Description: Binary data
#!/bin/bash ################## ### Definition ### ################## ## prefix PUB_PREFIX="/home/hayato/pub_mypostgres/bin" ## scale factor SCALE=100 ## pgbench init command INIT_COMMAND="pgbench -i -U postgres postgres -s $SCALE" SOURCE=$1 ################ ### clean up ### ################ pg_ctl stop -D data_pub -w pg_ctl stop -D data_sub -w rm -rf data* *log ####################### ### setup publisher ### ####################### initdb -D data_pub -U postgres cat << EOF >> data_pub/postgresql.conf port=5431 autovacuum = false shared_buffers = '30GB' max_wal_size = 20GB min_wal_size = 10GB wal_level = logical EOF pg_ctl -D data_pub start -w -l pub.log ${PUB_PREFIX}/$INIT_COMMAND -p 5431 psql -U postgres -p 5431 -c "CREATE PUBLICATION pub FOR ALL TABLES;" ####################### ### setup sublisher ### ####################### initdb -D data_sub -U postgres cat << EOF >> data_sub/postgresql.conf port=5432 autovacuum = false shared_buffers = '30GB' max_wal_size = 20GB min_wal_size = 10GB track_commit_timestamp = on # log_min_messages = DEBUG1 EOF pg_ctl -D data_sub start -w -l sub.log $INIT_COMMAND -p 5432 ( echo "CREATE TABLE pgbench_pub_history (tid int,bid int,aid bigint,delta int,mtime timestamp,filler char(22));" echo "CREATE TABLE pgbench_pub_tellers (tid int not null primary key,bid int,tbalance int,filler char(84));" echo "CREATE TABLE pgbench_pub_accounts (aid bigint not null primary key,bid int,abalance int,filler char(84));" echo "CREATE TABLE pgbench_pub_branches (bid int not null primary key,bbalance int,filler char(88));" ) | psql -p 5432 -U postgres if [ $SOURCE = "head" ] then psql -U postgres -p 5432 -c "CREATE SUBSCRIPTION sub CONNECTION 'port=5431 user=postgres' PUBLICATION pub;" else psql -U postgres -p 5432 -c "CREATE SUBSCRIPTION sub CONNECTION 'port=5431 user=postgres' PUBLICATION pub WITH (detect_update_deleted = on);" fi sleep 5s
#!/bin/bash ################## ### Definition ### ################## ## prefix PUB_PREFIX="/home/hayato/pub_mypostgres/bin" ## Used source SOURCE=patched ## Number of runs NUMRUN=1 ## Measurement duration DURATION=120 ## Number of clients during a run NUMCLIENTS=30 ########################### ### measure performance ### ########################### for i in `seq ${NUMRUN}` do # Prepare clean enviroment for each measurements ./v2_setup.sh $SOURCE echo "==================" echo "${SOURCE}_${i}.dat" echo "==================" # Do actual measurements ${PUB_PREFIX}/pgbench -p 5431 -U postgres postgres -c $NUMCLIENTS -j $NUMCLIENTS -T $DURATION > pub_${SOURCE}_${i}.dat & pgbench -p 5432 -U postgres postgres -c $NUMCLIENTS -j $NUMCLIENTS -T $DURATION > sub_${SOURCE}_${i}.dat done
#!/bin/bash ################## ### Definition ### ################## port_primary=5533 port_standby=5534 ## scale factor SCALE=100 ## pgbench init command INIT_COMMAND="pgbench -i -U postgres postgres -s $SCALE" ################ ### clean up ### ################ ./pg_ctl stop -D data_primary -w ./pg_ctl stop -D data_standby -w rm -rf data* *log ####################### ### setup publisher ### ####################### ./initdb -D data_primary -U postgres cat << EOF >> data_primary/postgresql.conf port=$port_primary autovacuum = false shared_buffers = '30GB' max_wal_size = 20GB min_wal_size = 10GB EOF ./pg_ctl -D data_primary start -w -l primary.log ./$INIT_COMMAND -p $port_primary ./psql -p $port_primary -d postgres -U postgres -c "SELECT pg_create_physical_replication_slot('standby_1');" ./psql -p $port_primary -d postgres -U postgres -c "CREATE ROLE replication WITH REPLICATION PASSWORD 'password' LOGIN;" ./pg_ctl -D data_primary restart -l primary.log ####################### ### setup standby ### ####################### ./pg_basebackup -h 127.0.0.1 -D data_standby -R -P -U replication -X s -p $port_primary cat << EOF >> data_standby/postgresql.conf port = $port_standby primary_slot_name = 'standby_1' hot_standby_feedback = on wal_receiver_status_interval=100s EOF ./pg_ctl -D data_standby start -w -l standby.log sleep 5s
#!/bin/bash ################## ### Definition ### ################## port_primary=5533 port_standby=5534 ## Used source SOURCE=hsfOFF_ad1hr ## Number of runs NUMRUN=5 ## Measurement duration DURATION=120 ## Number of clients during a run NUMCLIENTS=30 ########################### ### measure performance ### ########################### for i in `seq ${NUMRUN}` do # Prepare clean enviroment for each measurements ./v2_setup_n.sh echo "==================" echo "${SOURCE}_${i}.dat" echo "==================" # Do actual measurements ./pgbench -p $port_primary -U postgres postgres -c $NUMCLIENTS -j $NUMCLIENTS -T $DURATION > primary_${SOURCE}_${i}.dat & ./pgbench -p $port_standby -U postgres postgres -c $NUMCLIENTS -j $NUMCLIENTS -T $DURATION -b select-only --no-vacuum > standby_${SOURCE}_${i}.dat done