On Mon, Sep 20, 2021 at 5:21 PM Amit Kapila <amit.kapil...@gmail.com> wrote:
>
> On Mon, Sep 20, 2021 at 4:10 PM Fabrice Chapuis <fabrice636...@gmail.com> 
> wrote:
> >
> > Hi Amit,
> >
> > We can replay the problem: we load a table of several Gb in the schema of 
> > the publisher, this generates the worker's timeout after one minute from 
> > the end of this load. The table on which this load is executed is not 
> > replicated.
> >
> > 2021-09-16 12:06:50 CEST [24881]: [1-1] 
> > user=postgres,db=db012a00,client=[local] LOG:  duration: 1281408.171 ms  
> > statement: COPY db.table (col1, col2) FROM stdin;
> >
> > 2021-09-16 12:07:11 CEST [12161]: [1-1] user=,db=,client= LOG:  automatic 
> > analyze of table "db.table " system usage: CPU: user: 4.13 s, system: 0.55 
> > s, elapsed: 9.58 s
> >
> > 2021-09-16 12:07:50 CEST [3770]: [2-1] user=,db=,client= ERROR:  
> > terminating logical replication worker due to timeout
> >
> > Before increasing value for wal_sender_timeout and wal_receiver_timeout I 
> > thought to further investigate the mechanisms leading to this timeout.
> >
>
> The basic problem here seems to be that WAL Sender is not able to send
> a keepalive or any other message for the configured
> wal_receiver_timeout. I am not sure how that can happen but can you
> once try by switching autovacuum = off? I wanted to ensure that
> WALSender is not blocked due to the background process autovacuum.
>

The other thing we can try out is to check the data in pg_locks on
publisher during one minute after the large copy is finished. This we
can try out both with and without autovacuum.

-- 
With Regards,
Amit Kapila.


Reply via email to