Hi Vignesh, That's really prompt and solves our problem. Thank you buddy.
Please go through my inline comments:- On Thu, Apr 20, 2023 at 11:49 AM vignesh C <vignes...@gmail.com> wrote: > On Wed, 19 Apr 2023 at 17:26, shaurya jain <12345shau...@gmail.com> wrote: > > > > Hi Team, > > > > Could you please help me with this, It's urgent for the production > environment. > > > > On Wed, Apr 19, 2023 at 3:44 PM shaurya jain <12345shau...@gmail.com> > wrote: > >> > >> Hi Team, > >> > >> Could you please help, It's urgent for the production env? > >> > >> On Sun, Apr 16, 2023 at 2:40 AM shaurya jain <12345shau...@gmail.com> > wrote: > >>> > >>> Hi Team, > >>> > >>> Postgres Version:- 13.8 > >>> Issue:- Logical replication failing with SSL SYSCALL error > >>> Priority:-High > >>> > >>> We are migrating our database through logical replications, and all of > sudden below error pops up in the source and target logs which leads us to > nowhere. > >>> > >>> Logs from Source:- > >>> LOG: could not send data to client: Connection reset by peer > >>> STATEMENT: COPY public.test TO STDOUT > >>> FATAL: connection to client lost > >>> STATEMENT: COPY public.test TO STDOUT > >>> > >>> Logs from Target:- > >>> 2023-04-15 19:07:02 UTC::@:[1250]:ERROR: could not receive data from > WAL stream: SSL SYSCALL error: Connection timed out > >>> 2023-04-15 19:07:02 UTC::@:[1250]:CONTEXT: COPY test, line 365326932 > >>> 2023-04-15 19:07:03 UTC::@:[505]:LOG: background worker "logical > replication worker" (PID 1250) exited with exit code 1 > >>> 2023-04-15 19:07:03 UTC::@:[7155]:LOG: logical replication table > synchronization worker for subscription " sub_tables_2_180", table "test" > has started > >>> 2023-04-15 19:12:05 > >>> UTC:10.144.19.34(33276):postgres@webadmit_staging:[7112]:WARNING: > there is no transaction in progress > >>> 2023-04-15 19:14:08 > >>> UTC:10.144.19.34(33324):postgres@webadmit_staging:[6052]:LOG: > could not receive data from client: Connection reset by peer > >>> 2023-04-15 19:17:23 UTC::@:[2112]:ERROR: could not receive data from > WAL stream: SSL SYSCALL error: Connection timed out > >>> 2023-04-15 19:17:23 UTC::@:[1089]:ERROR: could not receive data from > WAL stream: SSL SYSCALL error: Connection timed out > >>> 2023-04-15 19:17:23 UTC::@:[2556]:ERROR: could not receive data from > WAL stream: SSL SYSCALL error: Connection timed out > >>> 2023-04-15 19:17:23 UTC::@:[505]:LOG: background worker "logical > replication worker" (PID 2556) exited with exit code 1 > >>> 2023-04-15 19:17:23 UTC::@:[505]:LOG: background worker "logical > replication worker" (PID 2112) exited with exit code 1 > >>> 2023-04-15 19:17:23 UTC::@:[505]:LOG: background worker "logical > replication worker" (PID 1089) exited with exit code 1 > >>> 2023-04-15 19:17:23 UTC::@:[7287]:LOG: logical replication apply > worker for subscription "sub_tables_2_180" has started > >>> 2023-04-15 19:17:23 UTC::@:[7288]:LOG: logical replication apply > worker for subscription "sub_tables_3_192" has started > >>> 2023-04-15 19:17:23 UTC::@:[7289]:LOG: logical replication apply > worker for subscription "sub_tables_1_180" has started > >>> > >>> Just after this error, all other replication slots get disabled for > some time and come back online along with COPY command with the new PID in > pg_stat_activity. > >>> > >>> I have a few queries regarding this:- > >>> > >>> The exact reason for disconnection (Few articles claim memory and few > network) > This might be because of network failure, did you notice any network > instability, could you check the TCP settings. > You could check the following configurations tcp_keepalives_idle, > tcp_keepalives_interval and tcp_keepalives_count. > This means it will connect the server based on tcp_keepalives_idle > seconds specified , if the server does not respond in > tcp_keepalives_interval seconds it'll try again, and will consider the > connection gone after tcp_keepalives_count failures. ---Yes you were > correct, that ssue was related to network where VPN tunnel got restarted > because of some miss configuration at tunnel side. By fixing that it > stands resolved so far. These params were set to below values:- 1. keepalives_idle 60 2. keepalives_interval 100 3. keepalives_count 60 > >>> Will it lead to data inconsistency? > It will not lead to inconsistency. In case of failure the failed > transaction will be rolled back. Yes, Migration was up to the mark after > fixing network. > > >>> Does this new PID COPY command again migrate the whole data of the > test table once again? > Yes, it will migrate the whole table data again in case of failures. Yes, > I follow you on that. Is there any way to rsync instead of simple copy? > > Regards, > Vignesh > -- Thanks and Regards, Shaurya Jain email:- 12345shau...@gmail.com *Mobile:- +91-8802809405* LinkedIn:- https://www.linkedin.com/in/shaurya-jain-74353023