Colleagues confirmed that the problem is with the network between data centers. 
Thank you!
воскресенье, 26 января 2025г., 20:33 +03:00 от Adrian Klaver  
adrian.kla...@aklaver.com :

>On 1/26/25 03:29, Дмитрий wrote:
> "How was it shut down, on purpose or a hardware/software issue?"
> - I reboot the receiver every 2 minutes on purpose. I determined this 
> time empirically, because replication breaks down approximately every 
> minute and a half. The reboot helps to advance the receiver.
>
> "Also do you have corresponding logs from primary?"
> - Attached to this message.
>
> "Unless, is there cascading replication going on?"
> - No, this is replication from the leader. The leader has its two 
> replicas and they are all in one data center. And the problematic 
> replica is needed to migrate to another data center.
>
> "Was that a manual intervention?"
> - Yes, reboot on schedule, every two minutes.
>
> "Is that what is shown above or have you restarted since the above and
> the server is running?"
> - Sometimes replication works without problems for several hours. But 
> when a breakdown occurs, rebooting every two minutes helps to catch up 
> with this replica.
>1) It would make life easier if the log line entry prefix timestamp was 
>set to same precision on primary and standby. As of now it looks like 
>the primary has %t (Time stamp without milliseconds) and the standby has
>%m (Time stamp with milliseconds)
>
>2) From the logs.
>
>Primary:
>
>2025-01-26 12:21:27 MSK [656]: [11-1] 
>app=v-host-n1,user=replicator,db=[unknown],client=192.168.5.1 STATEMENT: 
>  START_REPLICATION SLOT "slot_migration_to_rcod" 106B6/52000000 TIMELINE 61
>
>2025-01-26 12:21:27 MSK [656]: [12-1] 
>app=v-host-n1,user=replicator,db=[unknown],client=192.168.5.1 LOG: 
>disconnection: session time: 0:01:05.329 user=replicator database= 
>host=192.168.5.1 port=58380
>
>
>Standby:
>
>2025-01-26 12:21:27.113 MSK [10824] FATAL:  could not send data to WAL 
>stream: lost synchronization with server: got message type "0", length 
>825373235
>
>
>Do you know what is doing START_REPLICATION SLOT?
>
>
> Another interesting point. In addition to this replication, there are 
> two more, to the same data center. One replication had the same problem, 
> but a one-time restart helped to solve the problem, the replication is 
> still working normally. And the second replication does not have such 
> problems, it has been working since its launch, more than a month ago.
>
> --
>
>
>
>-- 
>Adrian Klaver
>adrian.kla...@aklaver.com

Reply via email to