Re: [BUGS] Recovery bug

2010-11-11 Thread Jeff Davis
On Thu, 2010-11-11 at 18:20 +0200, Heikki Linnakangas wrote: > On 11.11.2010 02:20, Jeff Davis wrote: > > There is a problem with this patch. ReadRecord() not only modifies > > global variables, it also modifies the location pointed to by "record", > > which is later used to set "wasShutdown". How

Re: [BUGS] Recovery bug

2010-11-11 Thread Heikki Linnakangas
On 11.11.2010 02:20, Jeff Davis wrote: There is a problem with this patch. ReadRecord() not only modifies global variables, it also modifies the location pointed to by "record", which is later used to set "wasShutdown". How about if we only set "wasShutdown" if there is no backup_label (because t

Re: [BUGS] Recovery bug

2010-11-10 Thread Jeff Davis
On Tue, 2010-10-26 at 10:48 +0300, Heikki Linnakangas wrote: > > The reason I didn't use ReadRecord is because it sets a global variable > > to point to the next location in the log, so that subsequent calls can > > just pass NULL for the location. > > True. XLogPageRead is new in 9.0, however. We

Re: [BUGS] Recovery bug

2010-10-26 Thread Heikki Linnakangas
On 26.10.2010 10:48, Heikki Linnakangas wrote: On 25.10.2010 19:04, Jeff Davis wrote: On Mon, 2010-10-25 at 14:44 +0300, Heikki Linnakangas wrote: It seems we should use ReadRecord instead of the lower-level XLogPageRead function. One difference is that ReadRecord performs a bunch of sanity che

Re: [BUGS] Recovery bug

2010-10-26 Thread Heikki Linnakangas
On 25.10.2010 19:04, Jeff Davis wrote: On Mon, 2010-10-25 at 14:44 +0300, Heikki Linnakangas wrote: It seems we should use ReadRecord instead of the lower-level XLogPageRead function. One difference is that ReadRecord performs a bunch of sanity checks on the record, while XLogPageRead just reads

Re: [BUGS] Recovery bug

2010-10-25 Thread Jeff Davis
On Mon, 2010-10-25 at 14:44 +0300, Heikki Linnakangas wrote: > It seems we should use ReadRecord instead of the lower-level > XLogPageRead function. One difference is that ReadRecord performs a > bunch of sanity checks on the record, while XLogPageRead just reads the > raw page. Extra sanity che

Re: [BUGS] Recovery bug

2010-10-25 Thread Heikki Linnakangas
On 19.10.2010 22:40, Jeff Davis wrote: On Tue, 2010-10-19 at 09:51 -0700, Jeff Davis wrote: On Tue, 2010-10-19 at 12:26 +0300, Heikki Linnakangas wrote: Excluding pg_xlog is just a recommendation at the moment, though, so we would need a big warning in the docs. And some way to enforce that jus

Re: [BUGS] Recovery bug

2010-10-19 Thread Robert Haas
On Tue, Oct 19, 2010 at 5:26 AM, Heikki Linnakangas wrote: > The fundamental problem is that by definition, a base backup is completely > indistinguishable from the data directory in the original server. Or is it? > We recommend that you exclude the files under pg_xlog from the backup. So we > cou

Re: [BUGS] Recovery bug

2010-10-19 Thread Jeff Davis
On Tue, 2010-10-19 at 09:51 -0700, Jeff Davis wrote: > On Tue, 2010-10-19 at 12:26 +0300, Heikki Linnakangas wrote: > > Excluding pg_xlog is just a recommendation at the moment, though, so we > > would need a big warning in the docs. And some way to enforce that > > just_kidding is not included i

Re: [BUGS] Recovery bug

2010-10-19 Thread Jeff Davis
On Tue, 2010-10-19 at 12:26 +0300, Heikki Linnakangas wrote: > >1. If reading a checkpoint from the backup_label location, verify that > > the REDO location for that checkpoint exists in addition to the > > checkpoint itself. If not, elog with a FATAL immediately. > > Makes sense. I wonder if

Re: [BUGS] Recovery bug

2010-10-19 Thread Heikki Linnakangas
On 18.10.2010 01:48, Jeff Davis wrote: On Fri, 2010-10-15 at 15:58 -0700, Jeff Davis wrote: I don't have a fix yet, because I think it requires a little discussion. For instance, it seems to be dangerous to assume that we're starting up from a backup with access to the archive when it might have

Re: [BUGS] Recovery bug

2010-10-19 Thread Fujii Masao
On Tue, Oct 19, 2010 at 5:18 PM, Heikki Linnakangas wrote: > On 19.10.2010 08:51, Fujii Masao wrote: >> >> On Tue, Oct 19, 2010 at 7:00 AM, Jeff Davis  wrote: > > Do users have any expectation that they can restore a backup without > using recovery.conf by merely having the WAL segment

Re: [BUGS] Recovery bug

2010-10-19 Thread Heikki Linnakangas
On 19.10.2010 08:51, Fujii Masao wrote: On Tue, Oct 19, 2010 at 7:00 AM, Jeff Davis wrote: Do users have any expectation that they can restore a backup without using recovery.conf by merely having the WAL segments in pg_xlog? I would expect that to work. What's the use case? Creating a st

Re: [BUGS] Recovery bug

2010-10-18 Thread Fujii Masao
On Tue, Oct 19, 2010 at 7:00 AM, Jeff Davis wrote: >> > Do users have any expectation that they can restore a backup without >> > using recovery.conf by merely having the WAL segments in pg_xlog? >> >> I would expect that to work. What's the use case? > If that's the expectation, I believe my or

Re: [BUGS] Recovery bug

2010-10-18 Thread Jeff Davis
On Mon, 2010-10-18 at 17:51 -0400, Robert Haas wrote: > On Mon, Oct 18, 2010 at 2:07 PM, Jeff Davis wrote: > > On Mon, 2010-10-18 at 17:02 +0900, Fujii Masao wrote: > >> Yep, to automatically delete backup_label and continue recovery seems to be > >> dangerous. How about just emitting FATAL error

Re: [BUGS] Recovery bug

2010-10-18 Thread Robert Haas
On Mon, Oct 18, 2010 at 2:07 PM, Jeff Davis wrote: > On Mon, 2010-10-18 at 17:02 +0900, Fujii Masao wrote: >> Yep, to automatically delete backup_label and continue recovery seems to be >> dangerous. How about just emitting FATAL error when neither restore_command >> nor primary_conninfo is suppli

Re: [BUGS] Recovery bug

2010-10-18 Thread Jeff Davis
On Mon, 2010-10-18 at 17:02 +0900, Fujii Masao wrote: > Yep, to automatically delete backup_label and continue recovery seems to be > dangerous. How about just emitting FATAL error when neither restore_command > nor primary_conninfo is supplied and backup_label exists? This seems to be > simpler th

Re: [BUGS] Recovery bug

2010-10-18 Thread Fujii Masao
>> Send a SIGQUIT to the postmaster to simulate a crash. When you bring it >> back up, it thinks it is recovering from a backup, so it reads >> backup_label. The checkpoint for the backup label is in 00...6, so it >> reads that just fine. But then it tries to read the WAL starting at the >> redo lo

Re: [BUGS] Recovery bug

2010-10-17 Thread Jeff Davis
On Fri, 2010-10-15 at 15:58 -0700, Jeff Davis wrote: > I don't have a fix yet, because I think it requires a little discussion. > For instance, it seems to be dangerous to assume that we're starting up > from a backup with access to the archive when it might have been a crash > of the primary syste

[BUGS] Recovery bug

2010-10-15 Thread Jeff Davis
This occurs on the master branch, but also pretty much everywhere else. To reproduce: First, it requires a little setup to make sure things happen in the right (wrong?) order. In postgresql.conf I set: archive_mode = on archive_command = 'echo -n archiving: %f... && while [ ! -f /tmp/a ]; do