Re: Conflict detection for update_deleted in logical replication

Masahiko Sawada Wed, 08 Jan 2025 02:33:58 -0800

On Wed, Jan 8, 2025 at 1:53 AM Amit Kapila <amit.kapil...@gmail.com> wrote:
>
> On Wed, Jan 8, 2025 at 3:02 PM Masahiko Sawada <sawada.m...@gmail.com> wrote:
> >
> > On Thu, Dec 19, 2024 at 11:11 PM Nisha Moond <nisha.moond...@gmail.com> 
> > wrote:
> > >
> > > Here is further performance test analysis with v16 patch-set.
> > >
> > >
> > > In the test scenarios already shared on -hackers [1], where pgbench was 
> > > run only on the publisher node in a pub-sub setup, no performance 
> > > degradation was observed on either node.
> > >
> > >
> > >
> > > In contrast, when pgbench was run only on the subscriber side with 
> > > detect_update_deleted=on [2], the TPS performance was reduced due to dead 
> > > tuple accumulation. This performance drop depended on the 
> > > wal_receiver_status_interval—larger intervals resulted in more dead tuple 
> > > accumulation on the subscriber node. However, after the improvement in 
> > > patch v16-0002, which dynamically tunes the status request, the default 
> > > TPS reduction was limited to only 1%.
> > >
> > >
> > >
> > > We performed more benchmarks with the v16-patches where pgbench was run 
> > > on both the publisher and subscriber, focusing on TPS performance. To 
> > > summarize the key observations:
> > >
> > >  - No performance impact on the publisher as dead tuple accumulation does 
> > > not occur on the publisher.
> >
> > Nice. It means that frequently getting in-commit-phase transactions by
> > the subscriber didn't have a negative impact on the publisher's
> > performance.
> >
> > >
> > >  - The performance is reduced on the subscriber side (TPS reduction 
> > > (~50%) [3] ) due to dead tuple retention for the conflict detection when 
> > > detect_update_deleted=on.
> > >
> > >  - Performance reduction happens only on the subscriber side, as workload 
> > > on the publisher is pretty high and the apply workers must wait for the 
> > > amount of transactions with earlier timestamps to be applied and flushed 
> > > before advancing the non-removable XID to remove dead tuples.
> >
> > Assuming that the performance dip happened due to dead tuple retention
> > for the conflict detection, would TPS on other databases also be
> > affected?
> >
>
> As we use slot->xmin to retain dead tuples, shouldn't the impact be
> global (means on all databases)?


I think so too.

>
> > >
> > >
> > > [3] Test with pgbench run on both publisher and subscriber.
> > >
> > >
> > >
> > > Test setup:
> > >
> > > - Tests performed on pgHead + v16 patches
> > >
> > > - Created a pub-sub replication system.
> > >
> > > - Parameters for both instances were:
> > >
> > >
> > >
> > >    share_buffers = 30GB
> > >
> > >    min_wal_size = 10GB
> > >
> > >    max_wal_size = 20GB
> > >
> > >    autovacuum = false
> >
> > Since you disabled autovacuum on the subscriber, dead tuples created
> > by non-hot updates are accumulated anyway regardless of
> > detect_update_deleted setting, is that right?
> >
>
> I think hot-pruning mechanism during the update operation will remove
> dead tuples even when autovacuum is disabled.

True, but why did it disable autovacuum? It seems that
case1-2_setup.sh doesn't specify fillfactor, which makes hot-updates
less likely to happen.

I understand that a certain performance dip happens due to dead tuple
retention, which is fine, but I'm surprised that the TPS decreased by
50% within 120 seconds. The TPS goes even worse for a longer test? I
did a quick benchmark where I completely disabled removing dead tuples
(by autovacuum=off and a logical slot) and ran pgbench but I didn't
see such a precipitous dip.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Conflict detection for update_deleted in logical replication

Reply via email to