Hi,
On Thursday, December 15, 2022 12:53 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > On Thu, Dec 15, 2022 at 7:16 AM Kyotaro Horiguchi <horikyota....@gmail.com> > wrote: > > > > At Wed, 14 Dec 2022 10:46:17 +0000, "Hayato Kuroda (Fujitsu)" > > <kuroda.hay...@fujitsu.com> wrote in > > > I have implemented and tested that workers wake up per > > > wal_receiver_timeout/2 and send keepalive. Basically it works well, but I > found two problems. > > > Do you have any good suggestions about them? > > > > > > 1) > > > > > > With this PoC at present, workers calculate sending intervals based > > > on its wal_receiver_timeout, and it is suppressed when the parameter is > > > set > to zero. > > > > > > This means that there is a possibility that walsender is timeout > > > when wal_sender_timeout in publisher and wal_receiver_timeout in > subscriber is different. > > > Supposing that wal_sender_timeout is 2min, wal_receiver_tiemout is > > > 5min, > > > > It seems to me wal_receiver_status_interval is better for this use. > > It's enough for us to docuemnt that "wal_r_s_interval should be > > shorter than wal_sener_timeout/2 especially when logical replication > > connection is using min_apply_delay. Otherwise you will suffer > > repeated termination of walsender". > > > > This sounds reasonable to me. Okay, I changed the time interval to wal_receiver_status_interval and added this description about timeout. FYI, wal_receiver_status_interval by definition specifies the minimum frequency for the WAL receiver process to send information to the upstream. So I utilized the value for WaitLatch directly. My descriptions of the documentation change follow it. > > > and min_apply_delay is 10min. The worker on subscriber will wake up > > > per 2.5min and send keepalives, but walsender exits before the message > arrives to publisher. > > > > > > One idea to avoid that is to send the min_apply_delay subscriber > > > option to publisher and compare them, but it may be not sufficient. > > > Because XXX_timout GUC parameters could be modified later. > > > > # Anyway, I don't think such asymmetric setup is preferable. > > > > > 2) > > > > > > The issue reported by Vignesh-san[1] has still remained. I have > > > already analyzed that [2], the root cause is that flushed WAL is not > > > updated and sent to the publisher. Even if workers send keepalive > > > messages to pub during the delay, the flushed position cannot be modified. > > > > I didn't look closer but the cause I guess is walsender doesn't die > > until all WAL has been sent, while logical delay chokes replication > > stream. For the (2) issue, a new thread has been created independently from this thread in [1]. I'll leave any new changes to the thread on this point. Attached the updated patch. Again, I used one basic patch in another thread to wake up logical replication worker shared in [2] for TAP tests. [1] - https://www.postgresql.org/message-id/tyapr01mb586668e50fc2447ad7f92491f5...@tyapr01mb5866.jpnprd01.prod.outlook.com [2] - https://www.postgresql.org/message-id/flat/20221122004119.GA132961%40nathanxps13 Best Regards, Takamichi Osumi
v11-0001-wake-up-logical-workers-as-needed-instead-of-rel.patch
Description: v11-0001-wake-up-logical-workers-as-needed-instead-of-rel.patch
v11-0002-Time-delayed-logical-replication-subscriber.patch
Description: v11-0002-Time-delayed-logical-replication-subscriber.patch