On Thu, Jan 20, 2022 at 9:18 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > It might be not reaching the actual send_keep_alive logic in > WalSndKeepaliveIfNecessary because of below code: > { > ... > /* > * Don't send keepalive messages if timeouts are globally disabled or > * we're doing something not partaking in timeouts. > */ > if (wal_sender_timeout <= 0 || last_reply_timestamp <= 0) return; .. > } > > I think you can add elog before the above return and before updating progress > in the below code: > case REORDER_BUFFER_CHANGE_INSERT: > if (!relentry->pubactions.pubinsert) > + { > + OutputPluginUpdateProgress(ctx); > return; > > This will help us to rule out one possibility.
Thanks for your advices! According to your advices, I applied 0001,0002 and 0003 to run the test script. When subscriber timeout, I filter publisher-side log: $ grep "before invoking update progress" pub.log | wc -l 60373557 $ grep "return because wal_sender_timeout or last_reply_timestamp" pub.log | wc -l 0 $ grep "return because waiting_for_ping_response" pub.log | wc -l 0 Based on this result, I think function WalSndKeepaliveIfNecessary was invoked, but function WalSndKeepalive was not invoked because (last_processing >= ping_time) is false. So I tried to see changes about last_processing and last_reply_timestamp (because ping_time is based on last_reply_timestamp). I found last_processing and last_reply_timestamp is set in function ProcessRepliesIfAny. last_processing is set to the time when function ProcessRepliesIfAny is invoked. Only when publisher receive a response from subscriber, last_reply_timestamp is set to last_processing and the flag waiting_for_ping_response is reset to false. When we are during the loop to skip all the changes of transaction, IIUC, we do not invoke function ProcessRepliesIfAny. So I think last_processing and last_reply_timestamp will not be changed in this loop. Therefore I think about our use case, we should modify the condition of invoking WalSndKeepalive.(please refer to 0004-Simple-modification-of-timing.patch, and note that this is only a patch for testing). At the same time I modify the input of WalSndKeepalive from true to false. This is because when input is true, waiting_for_ping_response is set to true in WalSndKeepalive. As mentioned above, ProcessRepliesIfAny is not invoked in the loop, so I think waiting_for_ping_response will not be reset to false and keepalive messages will not be sent. I tested after applying patches(0001 and 0004), I found the timeout was not printed in subscriber-side log. And the added messages "begin load changes" and "commit the log" were printed in publisher-side log: $ grep -ir "begin load changes" pub.log 2022-01-21 11:17:06.934 CST [2577699] LOG: begin load changes $ grep -ir "commit the log" pub.log 2022-01-21 11:21:15.564 CST [2577699] LOG: commit the log Attach the patches and test script mentioned above, in case someone wants to try. Regards, Wang wei
0001-Send-keepalive.patch
Description: 0001-Send-keepalive.patch
0002-Add-some-logs-to-debug.patch
Description: 0002-Add-some-logs-to-debug.patch
0003-Add-more-logs.patch
Description: 0003-Add-more-logs.patch
0004-Simple-modification-of-timing.patch
Description: 0004-Simple-modification-of-timing.patch
test.sh
Description: test.sh