RE: Time delayed LR (WAS Re: logical replication restrictions)

Takamichi Osumi (Fujitsu) Wed, 21 Dec 2022 22:02:12 -0800

Hi,


On Thursday, December 15, 2022 12:53 PM Amit Kapila <amit.kapil...@gmail.com> 
wrote:
> On Thu, Dec 15, 2022 at 7:16 AM Kyotaro Horiguchi <horikyota....@gmail.com>
> wrote:
> >
> > At Wed, 14 Dec 2022 10:46:17 +0000, "Hayato Kuroda (Fujitsu)"
> > <kuroda.hay...@fujitsu.com> wrote in
> > > I have implemented and tested that workers wake up per
> > > wal_receiver_timeout/2 and send keepalive. Basically it works well, but I
> found two problems.
> > > Do you have any good suggestions about them?
> > >
> > > 1)
> > >
> > > With this PoC at present, workers calculate sending intervals based
> > > on its wal_receiver_timeout, and it is suppressed when the parameter is 
> > > set
> to zero.
> > >
> > > This means that there is a possibility that walsender is timeout
> > > when wal_sender_timeout in publisher and wal_receiver_timeout in
> subscriber is different.
> > > Supposing that wal_sender_timeout is 2min, wal_receiver_tiemout is
> > > 5min,
> >
> > It seems to me wal_receiver_status_interval is better for this use.
> > It's enough for us to docuemnt that "wal_r_s_interval should be
> > shorter than wal_sener_timeout/2 especially when logical replication
> > connection is using min_apply_delay. Otherwise you will suffer
> > repeated termination of walsender".
> >
> 
> This sounds reasonable to me.
Okay, I changed the time interval to wal_receiver_status_interval
and added this description about timeout.

FYI, wal_receiver_status_interval by definition specifies
the minimum frequency for the WAL receiver process to send information
to the upstream. So I utilized the value for WaitLatch directly.
My descriptions of the documentation change follow it.

> > > and min_apply_delay is 10min. The worker on subscriber will wake up
> > > per 2.5min and send keepalives, but walsender exits before the message
> arrives to publisher.
> > >
> > > One idea to avoid that is to send the min_apply_delay subscriber
> > > option to publisher and compare them, but it may be not sufficient.
> > > Because XXX_timout GUC parameters could be modified later.
> >
> > # Anyway, I don't think such asymmetric setup is preferable.
> >
> > > 2)
> > >
> > > The issue reported by Vignesh-san[1] has still remained. I have
> > > already analyzed that [2], the root cause is that flushed WAL is not
> > > updated and sent to the publisher. Even if workers send keepalive
> > > messages to pub during the delay, the flushed position cannot be modified.
> >
> > I didn't look closer but the cause I guess is walsender doesn't die
> > until all WAL has been sent, while logical delay chokes replication
> > stream.
For the (2) issue, a new thread has been created independently from this thread 
in [1].
I'll leave any new changes to the thread on this point.

Attached the updated patch.
Again, I used one basic patch in another thread to wake up logical replication 
worker
shared in [2] for TAP tests.

[1] - 
https://www.postgresql.org/message-id/tyapr01mb586668e50fc2447ad7f92491f5...@tyapr01mb5866.jpnprd01.prod.outlook.com
[2] - 
https://www.postgresql.org/message-id/flat/20221122004119.GA132961%40nathanxps13


Best Regards,
        Takamichi Osumi

v11-0001-wake-up-logical-workers-as-needed-instead-of-rel.patch
Description: v11-0001-wake-up-logical-workers-as-needed-instead-of-rel.patch

v11-0002-Time-delayed-logical-replication-subscriber.patch
Description: v11-0002-Time-delayed-logical-replication-subscriber.patch

RE: Time delayed LR (WAS Re: logical replication restrictions)

Reply via email to