Hi Andres, regarding your first reply, I was inferring that from the fact I saw those messages at the same time the replication stream fell behind. What other logs would be more pertinent to this situation?
On Tue, Jul 24, 2018 at 4:02 PM Andres Freund <and...@anarazel.de> wrote: > Hi, > > On 2018-07-24 15:39:32 -0400, Rory Falloon wrote: > > Looking for any tips here on how to best maintain a replication slave > which > > is operating under some latency between networks - around 230ms. On a > good > > day/week, replication will keep up for a number of days, but however, > when > > the link is under higher than average usage, keeping replication active > can > > last merely minutes before falling behind again. > > > > 2018-07-24 18:46:14 GMTLOG: database system is ready to accept read only > > connections > > 2018-07-24 18:46:15 GMTLOG: started streaming WAL from primary at > > 2B/93000000 on timeline 1 > > 2018-07-24 18:59:28 GMTLOG: incomplete startup packet > > 2018-07-24 19:15:36 GMTLOG: incomplete startup packet > > 2018-07-24 19:15:36 GMTLOG: incomplete startup packet > > 2018-07-24 19:15:37 GMTLOG: incomplete startup packet > > > > As you can see above, it lasted about half an hour before falling out of > > sync. > > How can we see that from the above? The "incomplete startup messages" > are independent of streaming rep? I think you need to show us more logs. > > > > On the master, I have wal_keep_segments=128. What is happening when I see > > "incomplete startup packet" - is it simply the slave has fallen behind, > > and cannot 'catch up' using the wal segments quick enough? I assume the > > slave is using the wal segments to replay changes and assuming there are > > enough wal segments to cover the period it cannot stream properly, it > will > > eventually recover? > > You might want to look into replication slots to ensure the primary > keeps the necessary segments, but not more, around. You might also want > to look at wal_compression, to reduce the bandwidth usage. > > Greetings, > > Andres Freund >