Heikki, Thanks for your help on this issue.
I modified my restore script to return 1 only once and that solved the problem. Cheers, Chris On Fri, May 7, 2010 at 3:35 AM, Heikki Linnakangas < heikki.linnakan...@enterprisedb.com> wrote: > Chris Copeland wrote: > > I have two servers with the same hardware, OS, and pg binaries. Log > files > > are copied from the master to the standby and the standby is run in > recovery > > mode. > > > > When the standby is triggered to come out of recovery mode, it fails and > > generates a core dump. Upon trying to start it after failure, it starts > > looking for WAL files that it has already recovered. > >... > > 2010-05-06 10:57:30 CDT :LOG: restored log file > "00000001000000AF00000059" > > from archive > >>> >> Now I trigger the restore command to return 1 to stop the recovery > > 2010-05-06 10:59:30 CDT :LOG: could not open file > > "pg_xlog/00000001000000AF0000005A" (log file 175, segment 90): No such > file > > or directory > > 2010-05-06 10:59:30 CDT :LOG: redo done at AF/59000068 > > 2010-05-06 10:59:30 CDT :PANIC: could not open file > > "pg_xlog/00000001000000AF00000059" (log file 175, segment 89): No such > file > > or directory > > At startup, the server needs to re-fetch the last checkpoint record. > That means calling restore_command again for a file that was already > restored. It looks like restore_command is failing at the re-fetch, > which causes the PANIC. > > To trigger failover, restore_command needs to return 1, once, but it > must return 0 again on any subsequent calls. I suspect your > restore_command keeps returning 1 on the subsequent calls. > > -- > Heikki Linnakangas > EnterpriseDB http://www.enterprisedb.com >