On Mon, Sep 20, 2021 at 5:21 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Mon, Sep 20, 2021 at 4:10 PM Fabrice Chapuis <fabrice636...@gmail.com> > wrote: > > > > Hi Amit, > > > > We can replay the problem: we load a table of several Gb in the schema of > > the publisher, this generates the worker's timeout after one minute from > > the end of this load. The table on which this load is executed is not > > replicated. > > > > 2021-09-16 12:06:50 CEST [24881]: [1-1] > > user=postgres,db=db012a00,client=[local] LOG: duration: 1281408.171 ms > > statement: COPY db.table (col1, col2) FROM stdin; > > > > 2021-09-16 12:07:11 CEST [12161]: [1-1] user=,db=,client= LOG: automatic > > analyze of table "db.table " system usage: CPU: user: 4.13 s, system: 0.55 > > s, elapsed: 9.58 s > > > > 2021-09-16 12:07:50 CEST [3770]: [2-1] user=,db=,client= ERROR: > > terminating logical replication worker due to timeout > > > > Before increasing value for wal_sender_timeout and wal_receiver_timeout I > > thought to further investigate the mechanisms leading to this timeout. > > > > The basic problem here seems to be that WAL Sender is not able to send > a keepalive or any other message for the configured > wal_receiver_timeout. I am not sure how that can happen but can you > once try by switching autovacuum = off? I wanted to ensure that > WALSender is not blocked due to the background process autovacuum. >
The other thing we can try out is to check the data in pg_locks on publisher during one minute after the large copy is finished. This we can try out both with and without autovacuum. -- With Regards, Amit Kapila.