thanks a lot. One final question on this. if the streaming fails as in our case, we missed 4 days worth of data. We archive the log files on the primary. if we were to restart , what should be starting the point ?
Do we need to get all the archived logs to the pg_xlogs from the date ( in our case 03/05 ) . This is where I got confused. Again thanks a lot. Now we will definitely upgrade Regards On Sat, Mar 9, 2013 at 7:11 PM, Lonni J Friedman <netll...@gmail.com> wrote: > On Sat, Mar 9, 2013 at 4:05 PM, akp geek <akpg...@gmail.com> wrote: > > Appreciate your findings. taking your points and doing things now. We > can > > upgrade. Which version is more stable 9.2 or 9.1 > > They're two entirely different branches. They should both be equally > stable. However, if you're looking for the most straightforward path > I'd recommend going to 9.0.12. Also be sure to read the release notes > first. > > > > > We use GIST indexes quite a bit. and we gis also > > > > I recently compiled postgres 9.2 .. > > > > Regards > > > > > > > > On Sat, Mar 9, 2013 at 5:09 PM, Lonni J Friedman <netll...@gmail.com> > wrote: > >> > >> On Sat, Mar 9, 2013 at 1:51 PM, akp geek <akpg...@gmail.com> wrote: > >> > thank you. As you mentioned, I understood that I am starting the > >> > streaming > >> > scratch which is not what I wanted to do. > >> > > >> > Here is what I am planning to . > >> > > >> > Our replication process was down since March5th. > >> > > >> > 1. Is it Ok to get all wals from March5th till now to standby pg_xlog > >> > 2. take pg_basebackup > >> > 3. export the data dir > >> > 4. stop backup > >> > 5. restart the standby. > >> > > >> > Based on my understanding it should work, because when standby > started, > >> > it > >> > is not able find the files, as I have deleted from the slave them > when I > >> > tried to fix the issue > >> > >> Clearly it didn't work because you've been having issues for the past > >> 4 days, and you've already tried this approach unsuccessfully. Based > >> on the log snippets below, it looks like you have multiple problems. > >> First your slave is definitely missing WAL segments. You should > >> increase the number of WAL segments that are archived on the master > >> and ensure that your base backup is including the WAL segments (by > >> generating it with the -x option). However, that's the least of your > >> problems at the moment, because it looks like the master is what is > >> crashing. > >> > >> > > >> > > >> > 00000 2013-03-09 04:56:08 GMT LOG: entering standby mode > >> > cp: cannot access /backup/9.0.4/archive/0000000100000106000000E1 > >> > 00000 2013-03-09 04:56:08 GMT LOG: redo starts at 106/E1000020 > >> > cp: cannot access /backup/9.0.4/archive/0000000100000106000000E2 > >> > cp: cannot access /backup/9.0.4/archive/0000000100000106000000E3 > >> > cp: cannot access /backup/9.0.4/archive/0000000100000106000000E4 > >> > 00000 2013-03-09 04:56:10 GMT LOG: consistent recovery state > reached > >> > at > >> > 106/E45AD4A8 > >> > 00000 2013-03-09 04:56:10 GMT LOG: database system is ready to > >> > accept > >> > read only connections > >> > cp: cannot access /backup/9.0.4/archive/0000000100000106000000E5 > >> > cp: cannot access /backup/9.0.4/archive/0000000100000106000000E6 > >> > cp: cannot access /backup/9.0.4/archive/0000000100000106000000E7 > >> > cp: cannot access /backup/9.0.4/archive/0000000100000106000000E8 > >> > cp: cannot access /backup/9.0.4/archive/0000000100000106000000E9 > >> > cp: cannot access /backup/9.0.4/archive/0000000100000106000000EA > >> > 00000 2013-03-09 04:56:12 GMT LOG: invalid record length at > >> > 106/EA10B8C0 > >> > cp: cannot access /backup/9.0.4/archive/0000000100000106000000EA > >> > 00000 2013-03-09 04:56:12 GMT LOG: streaming replication > >> > successfully > >> > connected to primary > >> > [unknown] 00000 2013-03-09 04:57:00 GMT [unknown]LOG: connection > >> > received: host=[local] > >> > [unknown] 00000 2013-03-09 04:57:03 GMT [unknown]LOG: connection > >> > received: host=[local] > >> > > >> > [unknown] 10.155.253.43(51257) 00000 SELECT 2013-03-09 07:07:18 GMT > >> > prodLog: > >> > duration: 6316.649 ms > >> > [unknown] 10.155.253.43(51257) 00000 idle 2013-03-09 07:47:53 GMT > >> > prodLog: > >> > disconnection: session time: 0:41:06.529 user=postgres database=fprod > >> > host= > >> > 10.155.253.43 port=51257 > >> > 00000 2013-03-09 07:55:48 GMT LOG: restartpoint starting: time > >> > 00000 2013-03-09 08:25:47 GMT LOG: restartpoint complete: wrote > >> > 19419 > >> > buffers (7.4%); write=1799.792 s, sync=0.066 s, total=1799.867 s > >> > 00000 2013-03-09 08:25:47 GMT LOG: recovery restart point at > >> > 107/FB01B238 > >> > 00000 2013-03-09 08:25:47 GMT DETAIL: last completed transaction > was > >> > at > >> > log time 2013-03-09 08:25:41.85776+00 > >> > 00000 2013-03-09 08:55:48 GMT LOG: restartpoint starting: time > >> > psql [local] 08P01 idle 2013-03-09 09:10:52 GMT prodLog: unexpected > EOF > >> > on > >> > client connection > >> > psql [local] 00000 idle 2013-03-09 09:10:52 GMT prodLog: > disconnection: > >> > session time: 2:15:06.351 user=postgres database=fprod host=[local] > >> > XX000 2013-03-09 09:23:46 GMT FATAL: failed to add item to index > >> > page in > >> > 100924/100937/1225845 > >> > XX000 2013-03-09 09:23:46 GMT CONTEXT: xlog redo Insert item, > node: > >> > 100924/100937/1225845 blkno: 72666 offset: 234 nitem: 1 isdata: F > isleaf > >> > F > >> > isdelete > >> > F updateBlkno:36483 > >> > 00000 2013-03-09 09:23:46 GMT LOG: startup process (PID 3880) > exited > >> > with exit code 1 > >> > 00000 2013-03-09 09:23:46 GMT LOG: terminating any other active > >> > server > >> > processes > >> > >> The real problem is here: > >> XX000 2013-03-09 09:23:46 GMT FATAL: failed to add item to index > >> page in 100924/100937/1225845 > >> XX000 2013-03-09 09:23:46 GMT CONTEXT: xlog redo Insert item, > >> node: 100924/100937/1225845 blkno: 72666 offset: 234 nitem: 1 isdata: > >> F isleaf F isdelete F updateBlkno:36483 > >> > >> That looks like something is badly broken (maybe data corruption)? > >> Most of the google hits on that error are associated with GIST > >> indexes. Are you using GIST indexes? > >> > >> Are you really (still) using 9.0.4 ? I hope you realize that there > >> are known data corruption bugs in that version, and that version is > >> super old at this point. You really need to update to 9.0.12. > > > > > > > > -- > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > L. Friedman netll...@gmail.com > LlamaLand https://netllama.linux-sxs.org >