At Mon, 23 Jul 2018 15:59:16 +0900, Michael Paquier <mich...@paquier.xyz> wrote in <20180723065916.gi2...@paquier.xyz> > On Mon, Jul 23, 2018 at 01:57:48PM +0900, Kyotaro HORIGUCHI wrote: > > I'll register this to the next commitfest. > > > > https://www.postgresql.org/message-id/20180719.125117.155470938.horiguchi.kyot...@lab.ntt.co.jp > > This is an open item for v11.
Mmm. Thanks. I wrongly thought this was v10 item. Removed this from the next CF. Thank you for taking a look. > >> While considering this, I found a bug in 4b0d28de06, which > >> removed prior checkpoint from control file. It actually trims the > >> segments before the last checkpoint's redo segment but recycling > >> is still considered based on the *prevous* checkpoint. As the > >> result min_wal_size doesn't work as told. Specifically, setting > >> min/max_wal_size to 48MB and advance four or more segments then > >> two checkpoints leaves just one segment, which is less than > >> min_wal_size. > >> > >> The attached patch fixes that. One arguable point on this would > >> be the removal of the behavior when RemoveXLogFile(name, > >> InvalidXLogRecPtr, ..). > >> > >> The only place calling the function with the parameter is > >> timeline switching. Previously unconditionally 10 segments are > >> recycled after switchpoint but the reason for the behavior is we > >> didn't have the information on previous checkpoint at hand at the > >> time. But now we can use the timeline switch point as the > >> approximate of the last checkpoint's redo point and this allows > >> us to use min/max_wal_size properly at the time. > > I have been looking at that, and tested with this simple scenario: > create table aa (a int); > > Then just repeat the following SQLs: > insert into aa values (1); > select pg_switch_wal(); > checkpoint; > select pg_walfile_name(pg_current_wal_lsn()); > > By doing so, you would notice that the oldest WAL segment does not get > recycled after the checkpoint, while it should as the redo pointer is > always checkpoint generated. What happens is that this oldest segment > gets recycled every two checkpoints. (I'm not sure I'm getting it correctly..) I see the old segments recycled. When I see pg_switch_wal() returns 0/12.., pg_walfile_name is ...13 and segment files for 13 and 14 are found in pg_wal directory. That is, seg 12 was recycled as seg 14. log_checkpoint=on shows every checkpoint recycles 1 segment in the case. > Then I had a look at the proposed patch, with a couple of comments. > > - if (PriorRedoPtr == InvalidXLogRecPtr) > - recycleSegNo = endlogSegNo + 10; > - else > - recycleSegNo = XLOGfileslop(PriorRedoPtr); > + recycleSegNo = XLOGfileslop(RedoRecPtr); > I think that this is a new behavior, and should not be changed, see > point 3 below. > > In CreateCheckPoint(), the removal of past WAL segments is always going > to happen as RedoRecPtr is never InvalidXLogRecPtr, so the recycling > should happen also when PriorRedoPtr is InvalidXLogRecPtr, no? Yes. The reason for the change was the change of RemoveNonParentXlogFiles that I made and it is irrelevant to this bug fix (and rather breaking as you pointed..). So, reverted it. > /* Trim from the last checkpoint, not the last - 1 */ > This comment could be adjusted, let's remove "not the last - 1" . Oops! Thanks. The comment has finally vanished and melded into another comment just above. | * Delete old log files not required by the last checkpoint and recycle | * them > The calculation of _logSegNo in CreateRestartPoint is visibly incorrect, > this still uses PriorRedoPtr so the bug is not fixed for standbys. The > tweaks for ThisTimeLineID also need to be out of the loop where > PriorRedoPtr is InvalidXLogRecPtr, and only the distance calculation > should be kept. Agreed. While I reconsidered this, I noticed that the estimated checkpoint distance is 0 when PriorRedoPtr is invalid. This means that the first checkpoint/restartpoint tries to reduce WAL segments to min_wal_size. However, it happens only at initdb time and makes no substantial behavior change so I made the change ignoring the difference. > Finally, in summary, this patch is doing actually three things: > 1) Rename a couple of variables which refer incorrectly to the prior > checkpoint so as they refer to the last checkpoint generated. > 2) Make sure that WAL recycling/removal happens based on the last > checkpoint generated, which is the one just generated when past WAL > segments are cleaned up as post-operation actions. > 3) Enforce the recycling point to be the switch point instead of > arbitrarily recycle 10 segments, which is what b2a5545b has introduced. > Recycling at exactly the switch point is wrong, as the redo LSN of the > previous checkpoint may not be at the same segment when a timeline has > switched, so you would finish with recycling segments which are still > needed if an instance needs be crash-recovered post-promotion without > a completed post-recovery checkpoint. In short this is dangerous. > I'll let Heikki comment on that, but I think that you should fetch the > last redo LSN instead, still you need to be worried about checkpoints > running in parallel of the startup process, so I would stick with the > current logic. Thank you for the detail. I was coufused a bit there. I agree to preserve the logic, too. > 1) and 2) are in my opinion clearly oversights from 4b0d28d, but 3) is > not. Thank you for the comments and suggestions. After all, I did the following things in the attached patch. - Reverted the change on timeline switching. (Removed the (3)) - Fixed CreateRestartPoint to do the same thing with CreateCheckPoint. - Both CreateRestart/CheckPoint now tries trimming of WAL segments even for the first time. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
>From 662ecab9c2d0efc41756d98e18dbabe060da8d96 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horiguchi.kyot...@lab.ntt.co.jp> Date: Thu, 19 Jul 2018 12:13:56 +0900 Subject: [PATCH] Fix calculation base of WAL recycling The commit 4b0d28de06 removed the prior checkpoint and related things but that leaves WAL recycling based on the prior checkpoint. This makes max_wal_size and min_wal_size work incorrectly. This patch makes WAL recycling be based on the last checkpoint. --- src/backend/access/transam/xlog.c | 159 ++++++++++++++++++-------------------- 1 file changed, 75 insertions(+), 84 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 335b4a573d..05f750253f 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -2287,7 +2287,7 @@ assign_checkpoint_completion_target(double newval, void *extra) * XLOG segments? Returns the highest segment that should be preallocated. */ static XLogSegNo -XLOGfileslop(XLogRecPtr PriorRedoPtr) +XLOGfileslop(XLogRecPtr RedoRecPtr) { XLogSegNo minSegNo; XLogSegNo maxSegNo; @@ -2299,9 +2299,9 @@ XLOGfileslop(XLogRecPtr PriorRedoPtr) * correspond to. Always recycle enough segments to meet the minimum, and * remove enough segments to stay below the maximum. */ - minSegNo = PriorRedoPtr / wal_segment_size + + minSegNo = RedoRecPtr / wal_segment_size + ConvertToXSegs(min_wal_size_mb, wal_segment_size) - 1; - maxSegNo = PriorRedoPtr / wal_segment_size + + maxSegNo = RedoRecPtr / wal_segment_size + ConvertToXSegs(max_wal_size_mb, wal_segment_size) - 1; /* @@ -2316,7 +2316,7 @@ XLOGfileslop(XLogRecPtr PriorRedoPtr) /* add 10% for good measure. */ distance *= 1.10; - recycleSegNo = (XLogSegNo) ceil(((double) PriorRedoPtr + distance) / + recycleSegNo = (XLogSegNo) ceil(((double) RedoRecPtr + distance) / wal_segment_size); if (recycleSegNo < minSegNo) @@ -3899,12 +3899,12 @@ RemoveTempXlogFiles(void) /* * Recycle or remove all log files older or equal to passed segno. * - * endptr is current (or recent) end of xlog, and PriorRedoRecPtr is the - * redo pointer of the previous checkpoint. These are used to determine + * endptr is current (or recent) end of xlog, and RedoRecPtr is the + * redo pointer of the last checkpoint. These are used to determine * whether we want to recycle rather than delete no-longer-wanted log files. */ static void -RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr) +RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr RedoRecPtr, XLogRecPtr endptr) { DIR *xldir; struct dirent *xlde; @@ -3947,7 +3947,7 @@ RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr) /* Update the last removed location in shared memory first */ UpdateLastRemovedPtr(xlde->d_name); - RemoveXlogFile(xlde->d_name, PriorRedoPtr, endptr); + RemoveXlogFile(xlde->d_name, RedoRecPtr, endptr); } } } @@ -4021,14 +4021,14 @@ RemoveNonParentXlogFiles(XLogRecPtr switchpoint, TimeLineID newTLI) /* * Recycle or remove a log file that's no longer needed. * - * endptr is current (or recent) end of xlog, and PriorRedoRecPtr is the - * redo pointer of the previous checkpoint. These are used to determine + * endptr is current (or recent) end of xlog, and RedoRecPtr is the + * redo pointer of the last checkpoint. These are used to determine * whether we want to recycle rather than delete no-longer-wanted log files. - * If PriorRedoRecPtr is not known, pass invalid, and the function will - * recycle, somewhat arbitrarily, 10 future segments. + * If RedoRecPtr is not known, pass invalid, and the function will recycle, + * somewhat arbitrarily, 10 future segments. */ static void -RemoveXlogFile(const char *segname, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr) +RemoveXlogFile(const char *segname, XLogRecPtr RedoRecPtr, XLogRecPtr endptr) { char path[MAXPGPATH]; #ifdef WIN32 @@ -4042,10 +4042,10 @@ RemoveXlogFile(const char *segname, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr) * Initialize info about where to try to recycle to. */ XLByteToSeg(endptr, endlogSegNo, wal_segment_size); - if (PriorRedoPtr == InvalidXLogRecPtr) + if (RedoRecPtr == InvalidXLogRecPtr) recycleSegNo = endlogSegNo + 10; else - recycleSegNo = XLOGfileslop(PriorRedoPtr); + recycleSegNo = XLOGfileslop(RedoRecPtr); snprintf(path, MAXPGPATH, XLOGDIR "/%s", segname); @@ -8706,6 +8706,7 @@ CreateCheckPoint(int flags) bool shutdown; CheckPoint checkPoint; XLogRecPtr recptr; + XLogSegNo _logSegNo; XLogCtlInsert *Insert = &XLogCtl->Insert; uint32 freespace; XLogRecPtr PriorRedoPtr; @@ -9072,22 +9073,18 @@ CreateCheckPoint(int flags) */ smgrpostckpt(); - /* - * Delete old log files and recycle them - */ + /* Update the average distance between checkpoints/restartpoints. */ if (PriorRedoPtr != InvalidXLogRecPtr) - { - XLogSegNo _logSegNo; - - /* Update the average distance between checkpoints. */ UpdateCheckPointDistanceEstimate(RedoRecPtr - PriorRedoPtr); - /* Trim from the last checkpoint, not the last - 1 */ - XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size); - KeepLogSeg(recptr, &_logSegNo); - _logSegNo--; - RemoveOldXlogFiles(_logSegNo, PriorRedoPtr, recptr); - } + /* + * Delete old log files (those no longer needed for last checkpoint to + * prevent the disk holding the xlog from growing full. + */ + XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size); + KeepLogSeg(recptr, &_logSegNo); + _logSegNo--; + RemoveOldXlogFiles(_logSegNo, RedoRecPtr, recptr); /* * Make more log segments if needed. (Do this after recycling old log @@ -9253,6 +9250,11 @@ CreateRestartPoint(int flags) XLogRecPtr lastCheckPointEndPtr; CheckPoint lastCheckPoint; XLogRecPtr PriorRedoPtr; + XLogRecPtr receivePtr; + XLogRecPtr replayPtr; + TimeLineID replayTLI; + XLogRecPtr endptr; + XLogSegNo _logSegNo; TimestampTz xtime; /* @@ -9394,69 +9396,58 @@ CreateRestartPoint(int flags) } LWLockRelease(ControlFileLock); - /* - * Delete old log files (those no longer needed even for previous - * checkpoint/restartpoint) to prevent the disk holding the xlog from - * growing full. - */ + /* Update the average distance between checkpoints/restartpoints. */ if (PriorRedoPtr != InvalidXLogRecPtr) - { - XLogRecPtr receivePtr; - XLogRecPtr replayPtr; - TimeLineID replayTLI; - XLogRecPtr endptr; - XLogSegNo _logSegNo; - - /* Update the average distance between checkpoints/restartpoints. */ UpdateCheckPointDistanceEstimate(RedoRecPtr - PriorRedoPtr); - XLByteToSeg(PriorRedoPtr, _logSegNo, wal_segment_size); + /* + * Delete old log files (those no longer needed for last restartpoint to + * prevent the disk holding the xlog from growing full. + */ + XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size); - /* - * Get the current end of xlog replayed or received, whichever is - * later. - */ - receivePtr = GetWalRcvWriteRecPtr(NULL, NULL); - replayPtr = GetXLogReplayRecPtr(&replayTLI); - endptr = (receivePtr < replayPtr) ? replayPtr : receivePtr; + /* + * Retreat _logSegNo using the current end of xlog replayed or received, + * whichever is later. + */ + receivePtr = GetWalRcvWriteRecPtr(NULL, NULL); + replayPtr = GetXLogReplayRecPtr(&replayTLI); + endptr = (receivePtr < replayPtr) ? replayPtr : receivePtr; + KeepLogSeg(endptr, &_logSegNo); + _logSegNo--; + + /* + * Try to recycle segments on a useful timeline. If we've been promoted + * since the beginning of this restartpoint, use the new timeline chosen + * at end of recovery (RecoveryInProgress() sets ThisTimeLineID in that + * case). If we're still in recovery, use the timeline we're currently + * replaying. + * + * There is no guarantee that the WAL segments will be useful on the + * current timeline; if recovery proceeds to a new timeline right after + * this, the pre-allocated WAL segments on this timeline will not be used, + * and will go wasted until recycled on the next restartpoint. We'll live + * with that. + */ + if (RecoveryInProgress()) + ThisTimeLineID = replayTLI; - KeepLogSeg(endptr, &_logSegNo); - _logSegNo--; + RemoveOldXlogFiles(_logSegNo, RedoRecPtr, endptr); - /* - * Try to recycle segments on a useful timeline. If we've been - * promoted since the beginning of this restartpoint, use the new - * timeline chosen at end of recovery (RecoveryInProgress() sets - * ThisTimeLineID in that case). If we're still in recovery, use the - * timeline we're currently replaying. - * - * There is no guarantee that the WAL segments will be useful on the - * current timeline; if recovery proceeds to a new timeline right - * after this, the pre-allocated WAL segments on this timeline will - * not be used, and will go wasted until recycled on the next - * restartpoint. We'll live with that. - */ - if (RecoveryInProgress()) - ThisTimeLineID = replayTLI; + /* + * Make more log segments if needed. (Do this after recycling old log + * segments, since that may supply some of the needed files.) + */ + PreallocXlogFiles(endptr); - RemoveOldXlogFiles(_logSegNo, PriorRedoPtr, endptr); - - /* - * Make more log segments if needed. (Do this after recycling old log - * segments, since that may supply some of the needed files.) - */ - PreallocXlogFiles(endptr); - - /* - * ThisTimeLineID is normally not set when we're still in recovery. - * However, recycling/preallocating segments above needed - * ThisTimeLineID to determine which timeline to install the segments - * on. Reset it now, to restore the normal state of affairs for - * debugging purposes. - */ - if (RecoveryInProgress()) - ThisTimeLineID = 0; - } + /* + * ThisTimeLineID is normally not set when we're still in recovery. + * However, recycling/preallocating segments above needed ThisTimeLineID + * to determine which timeline to install the segments on. Reset it now, + * to restore the normal state of affairs for debugging purposes. + */ + if (RecoveryInProgress()) + ThisTimeLineID = 0; /* * Truncate pg_subtrans if possible. We can throw away all data before -- 2.16.3