On Fri, Jan 27, 2023 at 19:55 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > On Fri, Jan 27, 2023 at 5:18 PM houzj.f...@fujitsu.com > <houzj.f...@fujitsu.com> wrote: > > > > On Wednesday, January 25, 2023 7:26 PM Amit Kapila > <amit.kapil...@gmail.com> > > > > > > On Tue, Jan 24, 2023 at 8:15 AM wangw.f...@fujitsu.com > > > <wangw.f...@fujitsu.com> wrote: > > > > > > > > Attach the new patch. > > > > > > > > > > I think the patch missed to handle the case of non-transactional messages > which > > > was previously getting handled. I have tried to address that in the > > > attached. > Is > > > there a reason that shouldn't be handled? > > > > Thanks for updating the patch! > > > > I thought about the non-transactional message. I think it seems fine if we > > don’t handle it for timeout because such message is decoded via: > > > > WalSndLoop > > -XLogSendLogical > > --LogicalDecodingProcessRecord > > ---logicalmsg_decode > > ----ReorderBufferQueueMessage > > -----rb->message() -- //maybe send the message or do nothing here. > > > > After invoking rb->message(), we will directly return to the main > > loop(WalSndLoop) where we will get a chance to call > > WalSndKeepaliveIfNecessary() to avoid the timeout. > > > > Valid point. But this means the previous handling of non-transactional > messages was also redundant.
Thanks for the analysis, I think it makes sense. So I removed the handling of non-transactional messages. > > This is a bit different from transactional changes, because for > > transactional > changes, we > > will buffer them and then send every buffered change one by one(via > > ReorderBufferProcessTXN) without going back to the WalSndLoop, so we > don't get > > a chance to send keepalive message if necessary, which is more likely to > > cause > the > > timeout problem. > > > > I will also test the non-transactional message for timeout in case I missed > something. > > > > Okay, thanks. Please see if we can test a mix of transactional and > non-transactional messages as well. I tested a mix transaction of transactional and non-transactional messages on the current HEAD and reproduced the timeout problem. I think this result is OK. Because when decoding a transaction, non-transactional changes are processed directly and the function WalSndKeepaliveIfNecessary is called, while transactional changes are cached and processed after decoding. After decoding, only transactional changes will be processed (in the function ReorderBufferProcessTXN), so the timeout problem will still be reproduced. After applying the v8 patch, the test mentioned above didn't reproduce the timeout problem (Attach this test script 'test_with_nontransactional.sh'). Attach the new patch. Regards, Wang Wei
v8-0001-Fix-the-logical-replication-timeout-during-proces.patch
Description: v8-0001-Fix-the-logical-replication-timeout-during-proces.patch
test_with_nontransactional.sh
Description: test_with_nontransactional.sh