On Mon, Sep 20, 2021 at 9:43 PM Fabrice Chapuis <fabrice636...@gmail.com> wrote:
>
> By passing the autovacuum parameter to off the problem did not occur right 
> after loading the table as in our previous tests. However, the timeout 
> occurred later. We have seen the accumulation of .snap files for several Gb.
>
> ...
> -rw-------. 1 postgres postgres 16791226 Sep 20 15:26 
> xid-1238444701-lsn-2D2B-F5000000.snap
> -rw-------. 1 postgres postgres 16973268 Sep 20 15:26 
> xid-1238444701-lsn-2D2B-F6000000.snap
> -rw-------. 1 postgres postgres 16790984 Sep 20 15:26 
> xid-1238444701-lsn-2D2B-F7000000.snap
> -rw-------. 1 postgres postgres 16988112 Sep 20 15:26 
> xid-1238444701-lsn-2D2B-F8000000.snap
> -rw-------. 1 postgres postgres 16864593 Sep 20 15:26 
> xid-1238444701-lsn-2D2B-F9000000.snap
> -rw-------. 1 postgres postgres 16902167 Sep 20 15:26 
> xid-1238444701-lsn-2D2B-FA000000.snap
> -rw-------. 1 postgres postgres 16914638 Sep 20 15:26 
> xid-1238444701-lsn-2D2B-FB000000.snap
> -rw-------. 1 postgres postgres 16782471 Sep 20 15:26 
> xid-1238444701-lsn-2D2B-FC000000.snap
> -rw-------. 1 postgres postgres 16963667 Sep 20 15:27 
> xid-1238444701-lsn-2D2B-FD000000.snap
> ...
>

Okay, still not sure why the publisher is not sending keep_alive
messages in between spilling such a big transaction. If you see, we
have logic in WalSndLoop() wherein each time after sending data we
check whether we need to send a keep-alive message via function
WalSndKeepaliveIfNecessary(). I think to debug this problem further
you need to add some logs in function WalSndKeepaliveIfNecessary() to
see why it is not sending keep_alive messages when all these files are
being created.

Did you change the default value of
wal_sender_timeout/wal_receiver_timeout? What is the value of those
variables in your environment? Did you see the message "terminating
walsender process due to replication timeout" in your server logs?

-- 
With Regards,
Amit Kapila.


Reply via email to