On Thu, 2009-01-29 at 11:20 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > On Thu, 2009-01-29 at 10:36 +0900, Fujii Masao wrote: > >> Hi, > >> > >> On Wed, Jan 28, 2009 at 11:19 PM, Fujii Masao <masao.fu...@gmail.com> > >> wrote: > >>>> I feel quite good about this patch now. Given the amount of code churn, > >>>> it > >>>> requires testing, and I'll read it through one more time after sleeping > >>>> over > >>>> it. Simon, do you see anything wrong with this? > >>> I also read this patch and found something odd. I apologize if I misread > >>> it.. > >> If archive recovery fails after it reaches the last valid record > >> in the last unfilled WAL segment, subsequent recovery might cause > >> the following fatal error. This is because minSafeStartPoint indicates > >> the end of the last unfilled WAL segment which subsequent recovery > >> cannot reach. Is this bug? (I'm not sure how to fix this problem > >> because I don't understand yet why minSafeStartPoint is required.) > >> > >>> FATAL: WAL ends before end time of backup dump > > > > I think you're right. We need a couple of changes to avoid confusing > > messages. > > Hmm, we could update minSafeStartPoint in XLogFlush instead. That was > suggested when the idea of minSafeStartPoint was first thought of. > Updating minSafeStartPoint is analogous to flushing WAL: > minSafeStartPoint must be advanced to X before we can flush a data pgse > with LSN X. To avoid excessive controlfile updates, whenever we update > minSafeStartPoint, we can update it to the latest WAL record we've read. > > Or we could simply ignore that error if we've reached minSafeStartPoint > - 1 segment, assuming that even though minSafeStartPoint is higher, we > can't have gone past the end of valid WAL records in the last segment in > previous recovery either. But that feels more fragile.
My proposed fix for Fujii-san's minSafeStartPoint bug is to introduce another control file state DB_IN_ARCHIVE_RECOVERY_BASE. This would show that we are still recovering up to the point of the end of the base backup. Once we reach minSafeStartPoint we then switch state to DB_IN_ARCHIVE_RECOVERY, and set baseBackupReached boolean, which then enables writing new minSafeStartPoints when we open new WAL files in the future. We then have messages only when in DB_IN_ARCHIVE_RECOVERY_BASE state if (XLByteLT(EndOfLog, ControlFile->minRecoveryPoint) && ControlFile->state == DB_IN_ARCHIVE_RECOVERY_BASE) { if (reachedStopPoint) /* stopped because of stop request */ ereport(FATAL, (errmsg("requested recovery stop point is before end time of backup dump"))); else /* ran off end of WAL */ ereport(FATAL, (errmsg("WAL ends before end time of backup dump"))); } -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers