On Thu, Jul 17, 2025 at 4:44 PM shveta malik <shveta.ma...@gmail.com> wrote: > > On Thu, Jul 17, 2025 at 9:56 AM Dilip Kumar <dilipbal...@gmail.com> wrote: > > > > On Fri, Jul 11, 2025 at 4:28 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > > > On Thu, Jul 10, 2025 at 6:46 PM Masahiko Sawada <sawada.m...@gmail.com> > > > wrote: > > > > > > > > On Wed, Jul 9, 2025 at 9:09 PM Amit Kapila <amit.kapil...@gmail.com> > > > > wrote: > > > > > > > > > > > > > > > I think that even with retain_conflict_info = off, there is > > > > > > probably a > > > > > > point at which the subscriber can no longer keep up with the > > > > > > publisher. For example, if with retain_conflict_info = off we can > > > > > > withstand 100 clients running at the same time, then the fact that > > > > > > this performance degradation occurred with 15 clients explains that > > > > > > performance degradation is much more likely to occur because of > > > > > > retain_conflict_info = on. > > > > > > > > > > > > Test cases 3 and 4 are typical cases where this feature is used > > > > > > since > > > > > > the conflicts actually happen on the subscriber, so I think it's > > > > > > important to look at the performance in these cases. The worst case > > > > > > scenario for this feature is that when this feature is turned on, > > > > > > the > > > > > > subscriber cannot keep up even with a small load, and with > > > > > > max_conflict_retetion_duration we enter a loop of slot invalidation > > > > > > and re-creating, which means that conflict cannot be detected > > > > > > reliably. > > > > > > > > > > > > > > > > As per the above observations, it is less of a regression of this > > > > > feature but more of a lack of parallel apply or some kind of pre-fetch > > > > > for apply, as is recently proposed [1]. I feel there are use cases, as > > > > > explained above, for which this feature would work without any > > > > > downside, but due to a lack of some sort of parallel apply, we may not > > > > > be able to use it without any downside for cases where the contention > > > > > is only on a smaller set of tables. We have not tried, but may in > > > > > cases where contention is on a smaller set of tables, if users > > > > > distribute workload among different pub-sub pairs by using row > > > > > filters, there also, we may also see less regression. We can try that > > > > > as well. > > > > > > > > While I understand that there are some possible solutions we have > > > > today to reduce the contention, I'm not really sure these are really > > > > practical solutions as it increases the operational costs instead. > > > > > > > > > > I assume by operational costs you mean defining the replication > > > definitions such that workload is distributed among multiple apply > > > workers via subscriptions either by row_filters, or by defining > > > separate pub-sub pairs of a set of tables, right? If so, I agree with > > > you but I can't think of a better alternative. Even without this > > > feature as well, we know in such cases the replication lag could be > > > large as is evident in recent thread [1] and some offlist feedback by > > > people using native logical replication. As per a POC in the > > > thread[1], parallelizing apply or by using some prefetch, we could > > > reduce the lag but we need to wait for that work to mature to see the > > > actual effect of it. > > > > > > The path I see with this work is to clearly document the cases > > > (configuration) where this feature could be used without much downside > > > and keep the default value of subscription option to enable this as > > > false (which is already the case with the patch). Do you see any > > > better alternative for moving forward? > > > > I was just thinking about what are the most practical use cases where > > a user would need multiple active writer nodes. Most applications > > typically function well with a single active writer node. While it's > > beneficial to have multiple nodes capable of writing for immediate > > failover (e.g., if the current writer goes down), or they select a > > primary writer via consensus algorithms like Raft/Paxos, I rarely > > encounter use cases where users require multiple active writer nodes > > for scaling write workloads. > > Thank you for the feedback. In the scenario with a single writer node > and a subscriber with RCI enabled, we have not observed any > regression. Please refer to the test report at [1], specifically test > cases 1 and 2, which involve a single writer node. Next, we can test a > scenario with multiple (2-3) writer nodes publishing changes, and a > subscriber node subscribing to those writers with RCI enabled, which > can even serve as a good use case of the conflict detection we are > targeting through RCI enabling. >
I did a workload test for the setup as suggested above - "we can test a scenario with multiple (2-3) writer nodes publishing changes, and a subscriber node subscribing to those writers with RCI enabled". Here are the results : Highlights ========== - Two tests were done with two different workloads - 15 and 40 concurrent clients, respectively. - No regression was observed on any of the nodes. Used source =========== pgHead commit 62a17a92833 + v47 patch set Machine details =============== Intel(R) Xeon(R) CPU E7-4890 v2 @ 2.80GHz CPU(s) :88 cores, - 503 GiB RAM 01. pgbench with 15 clients ======================== Setup: - Two publishers and one subscriber: pub1 --> sub pub2 --> sub - All three nodes have same pgbench tables (scale=60) and are configured with: autovacuum = false shared_buffers = '30GB' -- Also, worker and logical replication related parameters were increased as per requirement (see attached scripts for details). - The topology is such that pub1 & pub2 are independent writers. The sub acts as reader(no writes) and has subscribed for all the changes from both pub1 and pub2. Workload: - pgbench (read-write) was run on both pub1 and pub2 (15 clients, duration = 5 minutes) - pgbench (read-only) was run on sub (15 clients, duration = 5 minutes) - The measurement was repeated 2 times. Observation: - No performance regression was observed on either the writer nodes (publishers) or the reader node (subscriber) with the patch applied. - TPS on both publishers was slightly better than on pgHead. This could be because all nodes run on the same machine - under high publisher load, the subscriber's apply worker performs I/O more slowly due to dead tuple retention, giving publisher-side pgbench more I/O bandwidth to complete writes. We can investigate further if needed. Detailed Results Table: On publishers: #run pgHead_Pub1_TPS pgHead_Pub2_TPS patched_pub1_TPS patched_pub2_TPS 1 13440.47394 13459.71296 14325.81026 14345.34077 2 13529.29649 13553.65741 14382.32144 14332.94777 median 13484.88521 13506.68518 14354.06585 14339.14427 - No regression On subscriber: #run pgHead_sub_TPS patched_sub_TPS 1 127009.0631 126894.9649 2 127767.4083 127207.8632 median 127388.2357 127051.4141 - No regression ~~~~ 02. pgbench with 40 clients ====================== Setup: - same as case-01 Workload: - pgbench (read-write) was run on both pub1 and pub2 (40 clients, duration = 10 minutes) - pgbench (read-only) was run on sub (40 clients, duration = 10 minutes) - The measurement was repeated 2 times. Observation: - No performance regression was observed on any writer nodes, i.e, the publishers, or the reader node i.e., subscriber with the patch applied. - Similar to case-01, TPS on both publishers was slightly higher than on pgHead. Detailed Results Table: On publisher: #run pgHead_Pub1_TPS patched_pub1_TPS pgHead_Pub2_TPS patched_pub2_TPS 1 17818.12479 18602.42504 17744.77163 18620.90056 2 17759.3144 18660.44407 17774.47442 18230.63849 median 17788.7196 18631.43455 17759.62302 18425.76952 - No regression On subscriber: #run pgHead_sub_TPS patched_sub_TPS 1 281075.3732 279438.4882 2 275988.1383 277388.6316 median 278531.7557 278413.5599 - No regression ~~~~ The scripts used to perform above tests are attached. -- Thanks, Nisha
#!/bin/bash ################## ### Definition ### ################## ## prefix ##PUB_PREFIX="/home/nisha/pg2/postgres/inst/bin" ## Used source SOURCE=head ## Number of runs NUMRUN=2 ## Measurement duration DURATION=600 ## Number of clients during a run NUMCLIENTS=40 ########################### ### measure performance ### ########################### for i in `seq ${NUMRUN}` do # Prepare clean enviroment for each measurements ./2pub_test_setup.sh $SOURCE echo "==================" echo "${SOURCE}_${i}.dat" echo "==================" # Do actual measurements ./pgbench -p 5433 -U postgres postgres -c $NUMCLIENTS -j $NUMCLIENTS -T $DURATION > pub1_40c_${SOURCE}_${i}.dat & ./pgbench -p 5434 -U postgres postgres -c $NUMCLIENTS -j $NUMCLIENTS -T $DURATION > pub2_40c_${SOURCE}_${i}.dat & ./pgbench -p 5435 -U postgres postgres -c $NUMCLIENTS -j $NUMCLIENTS -T $DURATION -b select-only > sub_40c_${SOURCE}_${i}.dat done
#!/bin/bash ################## ### Definition ### ################## ##sleep 5s port_pub1=5433 port_pub2=5434 port_sub=5435 ## scale factor SCALE=60 ## pgbench init command INIT_COMMAND="./pgbench -i -U postgres postgres -s $SCALE" SOURCE=$1 ################ ### clean up ### ################ ./pg_ctl stop -m i -D data_pub -w ./pg_ctl stop -m i -D data_pub2 -w ./pg_ctl stop -m i -D data_sub -w rm -rf data* *log ####################### ### setup publisher 1 ### ####################### ./initdb -D data_pub -U postgres cat << EOF >> data_pub/postgresql.conf port=$port_pub1 autovacuum = false shared_buffers = '30GB' max_wal_size = 20GB min_wal_size = 10GB wal_level = logical EOF ./pg_ctl -D data_pub start -w -l pub1.log $INIT_COMMAND -p $port_pub1 ./psql -U postgres -p $port_pub1 -c "CREATE PUBLICATION pub FOR ALL TABLES;" ####################### ### setup publisher 2 ### ####################### ./initdb -D data_pub2 -U postgres cat << EOF >> data_pub2/postgresql.conf port=$port_pub2 autovacuum = false shared_buffers = '30GB' max_wal_size = 20GB min_wal_size = 10GB wal_level = logical EOF ./pg_ctl -D data_pub2 start -w -l pub2.log $INIT_COMMAND -p $port_pub2 ./psql -U postgres -p $port_pub2 -c "CREATE PUBLICATION pub FOR ALL TABLES;" ####################### ### setup sublisher ### ####################### ./initdb -D data_sub -U postgres cat << EOF >> data_sub/postgresql.conf port=$port_sub autovacuum = false shared_buffers = '30GB' max_wal_size = 20GB min_wal_size = 10GB track_commit_timestamp = on # log_min_messages = DEBUG1 max_worker_processes = 100 max_logical_replication_workers = 50 #max_parallel_apply_workers_per_subscription = 8 EOF ./pg_ctl -D data_sub start -w -l sub.log $INIT_COMMAND -p $port_sub if [ $SOURCE = "head" ] then ./psql -U postgres -p $port_sub -c "CREATE SUBSCRIPTION sub1 CONNECTION 'port=$port_pub1 user=postgres' PUBLICATION pub WITH (copy_data=false);" ./psql -U postgres -p $port_sub -c "CREATE SUBSCRIPTION sub2 CONNECTION 'port=$port_pub2 user=postgres' PUBLICATION pub WITH (copy_data=false);" else ./psql -U postgres -p $port_sub -c "CREATE SUBSCRIPTION sub1 CONNECTION 'port=$port_pub1 user=postgres' PUBLICATION pub WITH (copy_data= false, retain_conflict_info = on);" ./psql -U postgres -p $port_sub -c "CREATE SUBSCRIPTION sub2 CONNECTION 'port=$port_pub2 user=postgres' PUBLICATION pub WITH (copy_data=false, retain_conflict_info = on);" fi sleep 5s