RE: Conflict detection for update_deleted in logical replication

Zhijie Hou (Fujitsu) Mon, 07 Jul 2025 03:22:00 -0700

On Mon, Jul 7, 2025 at 11:03 AM Zhijie Hou (Fujitsu) wrote:
> 
> On Sun, Jul 6, 2025 at 10:51 PM Masahiko Sawada wrote:
> ================================================
> > ======================
> > > ==
> > > The workload is mostly same as [4].
> > >
> > > Workload:
> > > - Initially ran pgbench with 40 clients for the *both side*.
> > > - Set max_conflict_retention_duration = {60, 120}
> > > - When the slot is invalidated on the subscriber side, stop the
> > > benchmark
> > and
> > >   wait until the subscriber would be caught up. Then the number of
> > > clients
> > on
> > >   the publisher would be half.
> > >   In this test the conflict slot could be invalidated as expected
> > > when the
> > workload
> > >   on the publisher was high, and it would not get invalidated anymore
> after
> > >   reducing the workload. This shows even if the slot has been
> > > invalidated
> > once,
> > >   users can continue to detect the update_deleted conflict by reduce the
> > >   workload on the publisher.
> > > - Total period of the test was 900s for each cases.
> > >
> > > (max_conflixt.tar.gz can run the same workload)
> > >
> > > Observation:
> > >  -
> > >  - Parallelism of the publisher side is reduced till 15->7->3 and finally 
> > > the
> > >    conflict slot is not invalidated.
> > >  - TPS on the subscriber side is improved when the concurrency was
> > reduced.
> > >    This is because the dead tuple accumulation is reduced on
> > > subscriber
> > due to
> > >    the reduced workload on the publisher.
> > >  - when publisher has Nclients=3, no regression in subscriber's TPS
> >
> > I think that users typically cannot control the amount of workloads in
> > production, meaning that once the performance regression starts to
> > happen the subscriber could enter the loop where invalidating the
> > slot, recovreing the performance, creating the slot, and having the
> performance problem.
> 
> Yes, you are right. The test is designed to demonstrate that the slot can be
> invalidated under high workload conditions as expected, while remaining valid
> if the workload is reduced. In production systems where workload reduction
> may not be possible, it’s recommended to disable `retain_conflict_info` to
> enhance performance. This decision involves balancing the need for reliable
> conflict detection with optimal system performance.
> 
> I think the hot standby feedback also has a similar impact on the performance
> of the primary, which is done to prevent the early removal of data necessary 
> for
> the standby, ensuring that it remains accessible when needed.


For reference, we conducted test[1] to evaluate the impact of enabling hot
standby feedback in a physical replication setup, observing approximately
a 50% regression in TPS on the primary as well.

[1] 
https://www.postgresql.org/message-id/CABdArM4OEwmh_31dQ8_F__VmHwk2ag_M%3DYDD4H%2ByYQBG%2BbHGzg%40mail.gmail.com

Best Regards,
Hou zj

RE: Conflict detection for update_deleted in logical replication

Reply via email to