Re: Logical replication failed with SSL SYSCALL error

shaurya jain Sun, 23 Apr 2023 20:00:09 -0700

Hi Vignesh,

That's really prompt and solves our problem. Thank you buddy.


Please go through my inline comments:-


On Thu, Apr 20, 2023 at 11:49 AM vignesh C <vignes...@gmail.com> wrote:

> On Wed, 19 Apr 2023 at 17:26, shaurya jain <12345shau...@gmail.com> wrote:
> >
> > Hi Team,
> >
> > Could you please help me with this, It's urgent for the production
> environment.
> >
> > On Wed, Apr 19, 2023 at 3:44 PM shaurya jain <12345shau...@gmail.com>
> wrote:
> >>
> >> Hi Team,
> >>
> >> Could you please help, It's urgent for the production env?
> >>
> >> On Sun, Apr 16, 2023 at 2:40 AM shaurya jain <12345shau...@gmail.com>
> wrote:
> >>>
> >>> Hi Team,
> >>>
> >>> Postgres Version:- 13.8
> >>> Issue:- Logical replication failing with SSL SYSCALL error
> >>> Priority:-High
> >>>
> >>> We are migrating our database through logical replications, and all of
> sudden below error pops up in the source and target logs which leads us to
> nowhere.
> >>>
> >>> Logs from Source:-
> >>> LOG:  could not send data to client: Connection reset by peer
> >>> STATEMENT:  COPY public.test TO STDOUT
> >>> FATAL:  connection to client lost
> >>> STATEMENT:  COPY public.test TO STDOUT
> >>>
> >>> Logs from Target:-
> >>> 2023-04-15 19:07:02 UTC::@:[1250]:ERROR: could not receive data from
> WAL stream: SSL SYSCALL error: Connection timed out
> >>> 2023-04-15 19:07:02 UTC::@:[1250]:CONTEXT: COPY test, line 365326932
> >>> 2023-04-15 19:07:03 UTC::@:[505]:LOG: background worker "logical
> replication worker" (PID 1250) exited with exit code 1
> >>> 2023-04-15 19:07:03 UTC::@:[7155]:LOG: logical replication table
> synchronization worker for subscription " sub_tables_2_180", table "test"
> has started
> >>> 2023-04-15 19:12:05 
> >>> UTC:10.144.19.34(33276):postgres@webadmit_staging:[7112]:WARNING:
> there is no transaction in progress
> >>> 2023-04-15 19:14:08 
> >>> UTC:10.144.19.34(33324):postgres@webadmit_staging:[6052]:LOG:
> could not receive data from client: Connection reset by peer
> >>> 2023-04-15 19:17:23 UTC::@:[2112]:ERROR: could not receive data from
> WAL stream: SSL SYSCALL error: Connection timed out
> >>> 2023-04-15 19:17:23 UTC::@:[1089]:ERROR: could not receive data from
> WAL stream: SSL SYSCALL error: Connection timed out
> >>> 2023-04-15 19:17:23 UTC::@:[2556]:ERROR: could not receive data from
> WAL stream: SSL SYSCALL error: Connection timed out
> >>> 2023-04-15 19:17:23 UTC::@:[505]:LOG: background worker "logical
> replication worker" (PID 2556) exited with exit code 1
> >>> 2023-04-15 19:17:23 UTC::@:[505]:LOG: background worker "logical
> replication worker" (PID 2112) exited with exit code 1
> >>> 2023-04-15 19:17:23 UTC::@:[505]:LOG: background worker "logical
> replication worker" (PID 1089) exited with exit code 1
> >>> 2023-04-15 19:17:23 UTC::@:[7287]:LOG: logical replication apply
> worker for subscription "sub_tables_2_180" has started
> >>> 2023-04-15 19:17:23 UTC::@:[7288]:LOG: logical replication apply
> worker for subscription "sub_tables_3_192" has started
> >>> 2023-04-15 19:17:23 UTC::@:[7289]:LOG: logical replication apply
> worker for subscription "sub_tables_1_180" has started
> >>>
> >>> Just after this error, all other replication slots get disabled for
> some time and come back online along with COPY command with the new PID in
> pg_stat_activity.
> >>>
> >>> I have a few queries regarding this:-
> >>>
> >>> The exact reason for disconnection (Few articles claim memory and few
> network)
> This might be because of network failure, did you notice any network
> instability, could you check the TCP settings.
> You could check the following configurations tcp_keepalives_idle,
> tcp_keepalives_interval and tcp_keepalives_count.
> This means it will connect the server based on tcp_keepalives_idle
> seconds specified , if the server does not respond in
> tcp_keepalives_interval seconds it'll try again, and will consider the
> connection gone after tcp_keepalives_count failures. ---Yes you were
> correct, that ssue was related to network where VPN tunnel got restarted
> because of some miss configuration at tunnel side. By fixing that it
> stands resolved so far. These params were set to below values:-


   1. keepalives_idle 60
   2. keepalives_interval 100
   3. keepalives_count 60


> >>> Will it lead to data inconsistency?
> It will not lead to inconsistency. In case of failure the failed
> transaction will be rolled back. Yes, Migration was up to the mark after
> fixing network.
>
> >>> Does this new PID COPY command again migrate the whole data of the
> test table once again?
> Yes, it will migrate the whole table data again in case of failures. Yes,
> I follow you on that. Is there any way to rsync instead of simple copy?
>
> Regards,
> Vignesh
>


-- 
Thanks and Regards,
Shaurya Jain
email:- 12345shau...@gmail.com
*Mobile:- +91-8802809405*
LinkedIn:- https://www.linkedin.com/in/shaurya-jain-74353023

Re: Logical replication failed with SSL SYSCALL error

Reply via email to