Re: Logical replication keepalive flood

Greg Nancarrow Fri, 01 Oct 2021 01:14:50 -0700

On Thu, Sep 30, 2021 at 5:56 PM Kyotaro Horiguchi
<horikyota....@gmail.com> wrote:
>
> After the patch applied, that keepalive is sent only when the loop is
> actually going to sleep some time.  In case the next WAL doesn't come
> for KEEPALIVE_TIMEOUT milliseconds, it sends a keepalive. There's a
> dubious behavior when sleeptime <= KEEPALIVE_TIMEOUT that it sends a
> keepalive immediately.  It was (as far as I recall) intentional in
> order to make the code simpler.  However, on second thought, we will
> have the next chance to send keepalive in that case, and intermittent
> frequent keepalives can happen with that behavior.  So I came to think
> that we can omit keepalives at all that case.
>
> (I myself haven't see the keepalive flood..)
>


I tried your updated patch
(avoid_keepalive_flood_at_bleeding_edge_of_wal.patch, rebased) and
also manually applied your previous keepalive-counting code
(count_keepalives2.diff.txt), adapted to the code updates.
I tested both the problem originally reported (which used
pg_recvlogical) and similarly using pub/sub of the pgbench_history
table, and in both cases I found that your patch very significantly
reduced the keepalives, so the keepalive flood is no longer seen.
I am still a little unsure about the impact on pg_recvlogical --endpos
functionality, which is detected by the regression test failure. I did
try to update pg_recvlogical, to not rely on a keepalive for --endpos,
but so far haven't been successful in doing that. If the test is
altered/removed then I think that the documentation for pg_recvlogical
--endpos will need updating in some way.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Logical replication keepalive flood

Reply via email to