Appreciate your findings. taking your points and doing things now. We can upgrade. Which version is more stable 9.2 or 9.1
We use GIST indexes quite a bit. and we gis also I recently compiled postgres 9.2 .. Regards On Sat, Mar 9, 2013 at 5:09 PM, Lonni J Friedman <netll...@gmail.com> wrote: > On Sat, Mar 9, 2013 at 1:51 PM, akp geek <akpg...@gmail.com> wrote: > > thank you. As you mentioned, I understood that I am starting the > streaming > > scratch which is not what I wanted to do. > > > > Here is what I am planning to . > > > > Our replication process was down since March5th. > > > > 1. Is it Ok to get all wals from March5th till now to standby pg_xlog > > 2. take pg_basebackup > > 3. export the data dir > > 4. stop backup > > 5. restart the standby. > > > > Based on my understanding it should work, because when standby started, > it > > is not able find the files, as I have deleted from the slave them when I > > tried to fix the issue > > Clearly it didn't work because you've been having issues for the past > 4 days, and you've already tried this approach unsuccessfully. Based > on the log snippets below, it looks like you have multiple problems. > First your slave is definitely missing WAL segments. You should > increase the number of WAL segments that are archived on the master > and ensure that your base backup is including the WAL segments (by > generating it with the -x option). However, that's the least of your > problems at the moment, because it looks like the master is what is > crashing. > > > > > > > 00000 2013-03-09 04:56:08 GMT LOG: entering standby mode > > cp: cannot access /backup/9.0.4/archive/0000000100000106000000E1 > > 00000 2013-03-09 04:56:08 GMT LOG: redo starts at 106/E1000020 > > cp: cannot access /backup/9.0.4/archive/0000000100000106000000E2 > > cp: cannot access /backup/9.0.4/archive/0000000100000106000000E3 > > cp: cannot access /backup/9.0.4/archive/0000000100000106000000E4 > > 00000 2013-03-09 04:56:10 GMT LOG: consistent recovery state reached > at > > 106/E45AD4A8 > > 00000 2013-03-09 04:56:10 GMT LOG: database system is ready to accept > > read only connections > > cp: cannot access /backup/9.0.4/archive/0000000100000106000000E5 > > cp: cannot access /backup/9.0.4/archive/0000000100000106000000E6 > > cp: cannot access /backup/9.0.4/archive/0000000100000106000000E7 > > cp: cannot access /backup/9.0.4/archive/0000000100000106000000E8 > > cp: cannot access /backup/9.0.4/archive/0000000100000106000000E9 > > cp: cannot access /backup/9.0.4/archive/0000000100000106000000EA > > 00000 2013-03-09 04:56:12 GMT LOG: invalid record length at > 106/EA10B8C0 > > cp: cannot access /backup/9.0.4/archive/0000000100000106000000EA > > 00000 2013-03-09 04:56:12 GMT LOG: streaming replication successfully > > connected to primary > > [unknown] 00000 2013-03-09 04:57:00 GMT [unknown]LOG: connection > > received: host=[local] > > [unknown] 00000 2013-03-09 04:57:03 GMT [unknown]LOG: connection > > received: host=[local] > > > > [unknown] 10.155.253.43(51257) 00000 SELECT 2013-03-09 07:07:18 GMT > prodLog: > > duration: 6316.649 ms > > [unknown] 10.155.253.43(51257) 00000 idle 2013-03-09 07:47:53 GMT > prodLog: > > disconnection: session time: 0:41:06.529 user=postgres database=fprod > host= > > 10.155.253.43 port=51257 > > 00000 2013-03-09 07:55:48 GMT LOG: restartpoint starting: time > > 00000 2013-03-09 08:25:47 GMT LOG: restartpoint complete: wrote 19419 > > buffers (7.4%); write=1799.792 s, sync=0.066 s, total=1799.867 s > > 00000 2013-03-09 08:25:47 GMT LOG: recovery restart point at > > 107/FB01B238 > > 00000 2013-03-09 08:25:47 GMT DETAIL: last completed transaction was > at > > log time 2013-03-09 08:25:41.85776+00 > > 00000 2013-03-09 08:55:48 GMT LOG: restartpoint starting: time > > psql [local] 08P01 idle 2013-03-09 09:10:52 GMT prodLog: unexpected EOF > on > > client connection > > psql [local] 00000 idle 2013-03-09 09:10:52 GMT prodLog: disconnection: > > session time: 2:15:06.351 user=postgres database=fprod host=[local] > > XX000 2013-03-09 09:23:46 GMT FATAL: failed to add item to index > page in > > 100924/100937/1225845 > > XX000 2013-03-09 09:23:46 GMT CONTEXT: xlog redo Insert item, node: > > 100924/100937/1225845 blkno: 72666 offset: 234 nitem: 1 isdata: F isleaf > F > > isdelete > > F updateBlkno:36483 > > 00000 2013-03-09 09:23:46 GMT LOG: startup process (PID 3880) exited > > with exit code 1 > > 00000 2013-03-09 09:23:46 GMT LOG: terminating any other active > server > > processes > > The real problem is here: > XX000 2013-03-09 09:23:46 GMT FATAL: failed to add item to index > page in 100924/100937/1225845 > XX000 2013-03-09 09:23:46 GMT CONTEXT: xlog redo Insert item, > node: 100924/100937/1225845 blkno: 72666 offset: 234 nitem: 1 isdata: > F isleaf F isdelete F updateBlkno:36483 > > That looks like something is badly broken (maybe data corruption)? > Most of the google hits on that error are associated with GIST > indexes. Are you using GIST indexes? > > Are you really (still) using 9.0.4 ? I hope you realize that there > are known data corruption bugs in that version, and that version is > super old at this point. You really need to update to 9.0.12. >