On Thu, Jul 17, 2025 at 4:44 PM shveta malik <shveta.ma...@gmail.com> wrote:
>
> On Thu, Jul 17, 2025 at 9:56 AM Dilip Kumar <dilipbal...@gmail.com> wrote:
> >
> > On Fri, Jul 11, 2025 at 4:28 PM Amit Kapila <amit.kapil...@gmail.com> wrote:
> > >
> > > On Thu, Jul 10, 2025 at 6:46 PM Masahiko Sawada <sawada.m...@gmail.com> 
> > > wrote:
> > > >
> > > > On Wed, Jul 9, 2025 at 9:09 PM Amit Kapila <amit.kapil...@gmail.com> 
> > > > wrote:
> > > >
> > > > >
> > > > > > I think that even with retain_conflict_info = off, there is 
> > > > > > probably a
> > > > > > point at which the subscriber can no longer keep up with the
> > > > > > publisher. For example, if with retain_conflict_info = off we can
> > > > > > withstand 100 clients running at the same time, then the fact that
> > > > > > this performance degradation occurred with 15 clients explains that
> > > > > > performance degradation is much more likely to occur because of
> > > > > > retain_conflict_info = on.
> > > > > >
> > > > > > Test cases 3 and 4 are typical cases where this feature is used 
> > > > > > since
> > > > > > the  conflicts actually happen on the subscriber, so I think it's
> > > > > > important to look at the performance in these cases. The worst case
> > > > > > scenario for this feature is that when this feature is turned on, 
> > > > > > the
> > > > > > subscriber cannot keep up even with a small load, and with
> > > > > > max_conflict_retetion_duration we enter a loop of slot invalidation
> > > > > > and re-creating, which means that conflict cannot be detected
> > > > > > reliably.
> > > > > >
> > > > >
> > > > > As per the above observations, it is less of a regression of this
> > > > > feature but more of a lack of parallel apply or some kind of pre-fetch
> > > > > for apply, as is recently proposed [1]. I feel there are use cases, as
> > > > > explained above, for which this feature would work without any
> > > > > downside, but due to a lack of some sort of parallel apply, we may not
> > > > > be able to use it without any downside for cases where the contention
> > > > > is only on a smaller set of tables. We have not tried, but may in
> > > > > cases where contention is on a smaller set of tables, if users
> > > > > distribute workload among different pub-sub pairs by using row
> > > > > filters, there also, we may also see less regression. We can try that
> > > > > as well.
> > > >
> > > > While I understand that there are some possible solutions we have
> > > > today to reduce the contention, I'm not really sure these are really
> > > > practical solutions as it increases the operational costs instead.
> > > >
> > >
> > > I assume by operational costs you mean defining the replication
> > > definitions such that workload is distributed among multiple apply
> > > workers via subscriptions either by row_filters, or by defining
> > > separate pub-sub pairs of a set of tables, right? If so, I agree with
> > > you but I can't think of a better alternative. Even without this
> > > feature as well, we know in such cases the replication lag could be
> > > large as is evident in recent thread [1] and some offlist feedback by
> > > people using native logical replication. As per a POC in the
> > > thread[1], parallelizing apply or by using some prefetch, we could
> > > reduce the lag but we need to wait for that work to mature to see the
> > > actual effect of it.
> > >
> > > The path I see with this work is to clearly document the cases
> > > (configuration) where this feature could be used without much downside
> > > and keep the default value of subscription option to enable this as
> > > false (which is already the case with the patch). Do you see any
> > > better alternative for moving forward?
> >
> > I was just thinking about what are the most practical use cases where
> > a user would need multiple active writer nodes. Most applications
> > typically function well with a single active writer node. While it's
> > beneficial to have multiple nodes capable of writing for immediate
> > failover (e.g., if the current writer goes down), or they select a
> > primary writer via consensus algorithms like Raft/Paxos, I rarely
> > encounter use cases where users require multiple active writer nodes
> > for scaling write workloads.
>
> Thank you for the feedback. In the scenario with a single writer node
> and a subscriber with RCI enabled, we have not observed any
> regression.  Please refer to the test report at [1], specifically test
> cases 1 and 2, which involve a single writer node. Next, we can test a
> scenario with multiple (2-3) writer nodes publishing changes, and a
> subscriber node subscribing to those writers with RCI enabled, which
> can even serve as a good use case of the conflict detection we are
> targeting through RCI enabling.
>

I did a workload test for the setup as suggested above - "we can test
a scenario with multiple (2-3) writer nodes publishing changes, and a
subscriber node subscribing to those writers with RCI enabled".

Here are the results :

Highlights
==========
- Two tests were done with two different workloads - 15 and 40
concurrent clients, respectively.
- No regression was observed on any of the nodes.

Used source
===========
pgHead commit 62a17a92833 + v47 patch set

Machine details
===============
Intel(R) Xeon(R) CPU E7-4890 v2 @ 2.80GHz CPU(s) :88 cores, - 503 GiB RAM

01. pgbench with 15 clients
========================
Setup:
 - Two publishers and one subscriber:
  pub1 --> sub
  pub2 --> sub
 - All three nodes have same pgbench tables (scale=60) and are configured with:
    autovacuum = false
    shared_buffers = '30GB'
    -- Also, worker and logical replication related parameters were
increased as per requirement (see attached scripts for details).
 - The topology is such that pub1 & pub2 are independent writers. The
sub acts as reader(no writes) and has subscribed for all the changes
from both pub1 and pub2.

Workload:
 - pgbench (read-write) was run on both pub1 and pub2 (15 clients,
duration = 5 minutes)
 - pgbench (read-only) was run on sub (15 clients, duration = 5 minutes)
 - The measurement was repeated 2 times.

Observation:
 - No performance regression was observed on either the writer nodes
(publishers) or the reader node (subscriber) with the patch applied.
 - TPS on both publishers was slightly better than on pgHead. This
could be because all nodes run on the same machine - under high
publisher load, the subscriber's apply worker performs I/O more slowly
due to dead tuple retention, giving publisher-side pgbench more I/O
bandwidth to complete writes. We can investigate further if needed.


Detailed Results Table:
On publishers:
#run   pgHead_Pub1_TPS   pgHead_Pub2_TPS   patched_pub1_TPS   patched_pub2_TPS
1   13440.47394   13459.71296   14325.81026   14345.34077
2   13529.29649   13553.65741   14382.32144   14332.94777
median 13484.88521   13506.68518   14354.06585   14339.14427
   - No regression

On subscriber:
#run   pgHead_sub_TPS   patched_sub_TPS
1 127009.0631 126894.9649
2 127767.4083 127207.8632
median 127388.2357 127051.4141
  - No regression

~~~~

02. pgbench with 40 clients
======================
Setup:
 - same as case-01

Workload:
 - pgbench (read-write) was run on both pub1 and pub2 (40 clients,
duration = 10 minutes)
 - pgbench (read-only) was run on sub (40 clients, duration = 10 minutes)
 - The measurement was repeated 2 times.

Observation:
 - No performance regression was observed on any writer nodes, i.e,
the publishers, or the reader node i.e., subscriber with the patch
applied.
 - Similar to case-01, TPS on both publishers was slightly higher than
on pgHead.

Detailed Results Table:
On publisher:
#run   pgHead_Pub1_TPS   patched_pub1_TPS   pgHead_Pub2_TPS   patched_pub2_TPS
1   17818.12479   18602.42504   17744.77163   18620.90056
2   17759.3144   18660.44407   17774.47442   18230.63849
median  17788.7196   18631.43455   17759.62302   18425.76952
   - No regression

On subscriber:
#run   pgHead_sub_TPS   patched_sub_TPS
1   281075.3732   279438.4882
2   275988.1383   277388.6316
median 278531.7557   278413.5599
   - No regression

~~~~
The scripts used to perform above tests are attached.

--
Thanks,
Nisha
#!/bin/bash

##################
### Definition ###
##################

## prefix
##PUB_PREFIX="/home/nisha/pg2/postgres/inst/bin"

## Used source
SOURCE=head

## Number of runs
NUMRUN=2

## Measurement duration
DURATION=600

## Number of clients during a run
NUMCLIENTS=40

###########################
### measure performance ###
###########################

for i in `seq ${NUMRUN}`
do
    # Prepare clean enviroment for each measurements
    ./2pub_test_setup.sh $SOURCE

    echo "=================="
    echo "${SOURCE}_${i}.dat"
    echo "=================="

    # Do actual measurements                                                                                        
    ./pgbench -p 5433 -U postgres postgres -c $NUMCLIENTS -j $NUMCLIENTS -T $DURATION > pub1_40c_${SOURCE}_${i}.dat &
    ./pgbench -p 5434 -U postgres postgres -c $NUMCLIENTS -j $NUMCLIENTS -T $DURATION > pub2_40c_${SOURCE}_${i}.dat &
    ./pgbench -p 5435 -U postgres postgres -c $NUMCLIENTS -j $NUMCLIENTS -T $DURATION -b select-only > sub_40c_${SOURCE}_${i}.dat
done
#!/bin/bash

##################
### Definition ###
##################

##sleep 5s


port_pub1=5433
port_pub2=5434
port_sub=5435


## scale factor
SCALE=60

## pgbench init command
INIT_COMMAND="./pgbench -i -U postgres postgres -s $SCALE"

SOURCE=$1

################
### clean up ###
################

./pg_ctl stop -m i -D data_pub -w
./pg_ctl stop -m i -D data_pub2 -w
./pg_ctl stop -m i -D data_sub -w
rm -rf data* *log

#######################
### setup publisher 1 ###
#######################

./initdb -D data_pub -U postgres
cat << EOF >> data_pub/postgresql.conf
port=$port_pub1
autovacuum = false
shared_buffers = '30GB'
max_wal_size = 20GB
min_wal_size = 10GB
wal_level = logical
EOF

./pg_ctl -D data_pub start -w -l pub1.log

$INIT_COMMAND -p $port_pub1
./psql -U postgres -p $port_pub1 -c "CREATE PUBLICATION pub FOR ALL TABLES;"


#######################
### setup publisher 2 ###
#######################

./initdb -D data_pub2 -U postgres
cat << EOF >> data_pub2/postgresql.conf
port=$port_pub2
autovacuum = false
shared_buffers = '30GB'
max_wal_size = 20GB
min_wal_size = 10GB
wal_level = logical
EOF

./pg_ctl -D data_pub2 start -w -l pub2.log

$INIT_COMMAND -p $port_pub2
./psql -U postgres -p $port_pub2 -c "CREATE PUBLICATION pub FOR ALL TABLES;"


#######################
### setup sublisher ###
#######################

./initdb -D data_sub -U postgres

cat << EOF >> data_sub/postgresql.conf
port=$port_sub
autovacuum = false
shared_buffers = '30GB'
max_wal_size = 20GB
min_wal_size = 10GB
track_commit_timestamp = on
# log_min_messages = DEBUG1
max_worker_processes = 100
max_logical_replication_workers = 50
#max_parallel_apply_workers_per_subscription = 8
EOF

./pg_ctl -D data_sub start -w -l sub.log
$INIT_COMMAND -p $port_sub


if [ $SOURCE = "head" ]
then
    ./psql -U postgres -p $port_sub -c "CREATE SUBSCRIPTION sub1 CONNECTION 'port=$port_pub1 user=postgres' PUBLICATION pub WITH (copy_data=false);"
    ./psql -U postgres -p $port_sub -c "CREATE SUBSCRIPTION sub2 CONNECTION 'port=$port_pub2 user=postgres' PUBLICATION pub WITH (copy_data=false);"
else
    ./psql -U postgres -p $port_sub -c "CREATE SUBSCRIPTION sub1 CONNECTION 'port=$port_pub1 user=postgres' PUBLICATION pub WITH (copy_data= false, retain_conflict_info = on);"
    ./psql -U postgres -p $port_sub -c "CREATE SUBSCRIPTION sub2 CONNECTION 'port=$port_pub2 user=postgres' PUBLICATION pub WITH (copy_data=false, retain_conflict_info = on);"
fi



sleep 5s

Reply via email to