On Mon, Sep 20, 2021 at 4:10 PM Fabrice Chapuis <fabrice636...@gmail.com> wrote: > > Hi Amit, > > We can replay the problem: we load a table of several Gb in the schema of the > publisher, this generates the worker's timeout after one minute from the end > of this load. The table on which this load is executed is not replicated. > > 2021-09-16 12:06:50 CEST [24881]: [1-1] > user=postgres,db=db012a00,client=[local] LOG: duration: 1281408.171 ms > statement: COPY db.table (col1, col2) FROM stdin; > > 2021-09-16 12:07:11 CEST [12161]: [1-1] user=,db=,client= LOG: automatic > analyze of table "db.table " system usage: CPU: user: 4.13 s, system: 0.55 s, > elapsed: 9.58 s > > 2021-09-16 12:07:50 CEST [3770]: [2-1] user=,db=,client= ERROR: terminating > logical replication worker due to timeout > > Before increasing value for wal_sender_timeout and wal_receiver_timeout I > thought to further investigate the mechanisms leading to this timeout. >
The basic problem here seems to be that WAL Sender is not able to send a keepalive or any other message for the configured wal_receiver_timeout. I am not sure how that can happen but can you once try by switching autovacuum = off? I wanted to ensure that WALSender is not blocked due to the background process autovacuum. -- With Regards, Amit Kapila.