Colleagues confirmed that the problem is with the network between data centers. Thank you! воскресенье, 26 января 2025г., 20:33 +03:00 от Adrian Klaver adrian.kla...@aklaver.com :
>On 1/26/25 03:29, Дмитрий wrote: > "How was it shut down, on purpose or a hardware/software issue?" > - I reboot the receiver every 2 minutes on purpose. I determined this > time empirically, because replication breaks down approximately every > minute and a half. The reboot helps to advance the receiver. > > "Also do you have corresponding logs from primary?" > - Attached to this message. > > "Unless, is there cascading replication going on?" > - No, this is replication from the leader. The leader has its two > replicas and they are all in one data center. And the problematic > replica is needed to migrate to another data center. > > "Was that a manual intervention?" > - Yes, reboot on schedule, every two minutes. > > "Is that what is shown above or have you restarted since the above and > the server is running?" > - Sometimes replication works without problems for several hours. But > when a breakdown occurs, rebooting every two minutes helps to catch up > with this replica. >1) It would make life easier if the log line entry prefix timestamp was >set to same precision on primary and standby. As of now it looks like >the primary has %t (Time stamp without milliseconds) and the standby has >%m (Time stamp with milliseconds) > >2) From the logs. > >Primary: > >2025-01-26 12:21:27 MSK [656]: [11-1] >app=v-host-n1,user=replicator,db=[unknown],client=192.168.5.1 STATEMENT: > START_REPLICATION SLOT "slot_migration_to_rcod" 106B6/52000000 TIMELINE 61 > >2025-01-26 12:21:27 MSK [656]: [12-1] >app=v-host-n1,user=replicator,db=[unknown],client=192.168.5.1 LOG: >disconnection: session time: 0:01:05.329 user=replicator database= >host=192.168.5.1 port=58380 > > >Standby: > >2025-01-26 12:21:27.113 MSK [10824] FATAL: could not send data to WAL >stream: lost synchronization with server: got message type "0", length >825373235 > > >Do you know what is doing START_REPLICATION SLOT? > > > Another interesting point. In addition to this replication, there are > two more, to the same data center. One replication had the same problem, > but a one-time restart helped to solve the problem, the replication is > still working normally. And the second replication does not have such > problems, it has been working since its launch, more than a month ago. > > -- > > > >-- >Adrian Klaver >adrian.kla...@aklaver.com