On 21.12.2012 08:18, Amit Kapila wrote:
On Thursday, December 20, 2012 11:15 PM Heikki Linnakangas wrote:
On 20.12.2012 18:19, Fujii Masao wrote:
InstallXLogFileSegment() also uses ThisTimeLineID. But your recent
commit
doesn't take care of it and prevents the standby from recycling the
WAL files
properly. Specifically, the standby recycles the WAL file to wrong
name.
A-ha, good catch. So that's actually a live bug in 9.1 and 9.2 as well:
after the recovery target timeline has changed, restartpoints will
continue to preallocate/recycle WAL files for the old timeline. That's
otherwise harmless, but the useless WAL files waste space, and
walreceiver will have to always create new files.
So instead of always running with ThisTimeLineID = 0 in the
checkpointer
process, I guess we'll have to update it to the timeline being
replayed,
when creating a restartpoint.
Shouldn't there be a check if(RecoveryInProgress), before assigning
RecoveryTargetTLI to ThisTimeLineID in CreateRestartPoint()?
Hmm, I don't think so. You're not supposed to get that far in
CreateRestartPoint() if recovery has already ended, or just being ended.
The startup process "ends recovery", ie. makes RecoveryInProgress()
return false, only after writing the end-of-recovery checkpoint. And
after the end-of-recovery checkpoint has been written,
CreateRestartPoint() will do nothing, because the end-of-recovery
checkpoint is later than the last potential restartpoint. I'm talking
about this check in CreateRestartPoint():
if (XLogRecPtrIsInvalid(lastCheckPointRecPtr) ||
XLByteLE(lastCheckPoint.redo, ControlFile->checkPointCopy.redo))
{
ereport(DEBUG2,
(errmsg("skipping restartpoint, already performed at
%X/%X",
(uint32) (lastCheckPoint.redo
>> 32), (uint32) lastCheckPoint.redo)));
...
return false;
}
However, there's this just before we recycle WAL segments:
/*
* Update pg_control, using current time. Check that it still shows
* IN_ARCHIVE_RECOVERY state and an older checkpoint, else do nothing;
* this is a quick hack to make sure nothing really bad happens if
somehow
* we get here after the end-of-recovery checkpoint.
*/
LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY &&
XLByteLT(ControlFile->checkPointCopy.redo, lastCheckPoint.redo))
{
> ...
but I believe that "quick hack" is just paranoia. You should not get
that far after the end-of-recovery checkpoint.
In any case, if you somehow get there anyway, the worst that will happen
is that some old WAL segments are recycled/preallocated on the old
timeline, wasting some space until the next checkpoint.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers