Hi Adrian, thank you very much for your patience. I apologise for the missing information.
On 9 March 2016 16:13:00 +01:00, Adrian Klaver <adrian.kla...@aklaver.com> wrote: > On 03/09/2016 04:56 AM, <fred...@huitfeldt.com> wrote: > > > Hi Adrian, > > > > thank you very much for your response. > > > > I ran the "VACUUM ANALYZE" command on the master node. > > > > Regarding log messages. > > > > Here is the contents of the log (excluding connections/disconnections): > > > Assuming the below is from the replica database. > the "LOG: recovery was paused" message was indeed from the replica. > > > > > > 2016-02-22 02:30:08 GMT 24616 LOG: recovery has paused > > > So what happened to cause the above? > we automatically pause recovery on the replica before running pg_dump. This is in order to make certain that we get a consistent dump of the database. > > I am not seeing anything below that indicates the recovery started again. > the reason why we do not see a matching "resume" is that the pg_dump failed > and our error handling was insufficient. > > > > 2016-02-22 02:30:08 GMT 24616 HINT: Execute pg_xlog_replay_resume() to > > continue. > > 2016-02-22 02:37:19 GMT 23859 DBNAME ERROR: missing chunk number 0 for > > toast value 2747579 in pg_toast_22066 > > 2016-02-22 02:37:19 GMT 23859 DBNAME STATEMENT: COPY public.room_shape > > (room_uuid, data) TO stdout; > > 2016-02-22 02:37:41 GMT 2648 DBNAME LOG: could not receive data from > > client: Connection reset by peer > > 2016-02-22 02:37:41 GMT 2648 DBNAME LOG: unexpected EOF on client > > connection > > > What does the log from the master show? > It doesnt seem to show much. It does have these repeated messages, however: 2016-02-22 02:12:18 GMT 30908 LOG: using stale statistics instead of current ones because stats collector is not responding 2016-02-22 02:13:01 GMT 30908 LOG: using stale statistics instead of current ones because stats collector is not responding 2016-02-22 02:13:52 GMT 30908 LOG: using stale statistics instead of current ones because stats collector is not responding There are lots of these mesages within the timeframe. There seems to be a couple of them every 2-4 hours. > > > > > > Best regards, > > Fredrik Huitfeldt > > > > > > On 7 March 2016 16:35:29 +01:00, Adrian Klaver > > <<adrian.kla...@aklaver.com>> wrote: > > > > > On 03/06/2016 10:18 PM, <fred...@huitfeldt.com> > > > <mailto:<fred...@huitfeldt.com>> wrote: > > > > > > HI All, > > > > > > i would really appreciate any help I can get on this issue. > > > > > > basically, a pg_basebackup + streaming attach, led to a database > > > that we > > > could not read from afterwards. > > > > > > > > > From original post: > > > > > > <http://www.postgresql.org/message-id/1456919678340.31300.116900@webmail2> > > > > > > "The issue remained until we ran a full vacuum analyze on the cluster." > > > > > > Which cluster was that, the master or the slave? > > > > > > "I have logfiles from the incident, but I cannot see anything out of > > > the ordinary (despite having a fair amount of experience investigating > > > postgresql logs)." > > > > > > > > > Can we see the section before and after ERROR? > > > > > > > > > Beset regards, > > > Fredrik > > > > > > PS please advise if this is better posted on another list. > > > > > > > > > > > > -- > > > Adrian Klaver > > > <adrian.kla...@aklaver.com> <mailto:<adrian.kla...@aklaver.com>> > > > > > > > -- > Adrian Klaver > <adrian.kla...@aklaver.com> > Best regards, Fredrik