On 29 March 2018 at 01:50, Robert Haas <robertmh...@gmail.com> wrote: > On Tue, Mar 27, 2018 at 11:53 PM, Pavan Deolasee > <pavan.deola...@gmail.com> wrote: >> Yeah, we should not do that. The patch surely does not intend to replay any >> more WAL than what we do today. The idea is to just use a different >> mechanism to find the prior checkpoint. But we should surely find the latest >> prior checkpoint. In the rare scenario that Tom showed, we should just throw >> an error and fix the patch if it's not doing that already. > > It's not clear to me that there is any reasonable fix short of giving > up on this approach altogether. But I might be missing something.
(Sorry to come back to this late) Yes, a few things have been been missed in this discussion. Michael's point is invalid. The earlier discussion said that *replay* of WAL earlier than the last checkpoint could lead to corruption, but this doesn't mean the records themselves are in some way damaged structurally, only that their content can lead to inconsistency in the database. The xl_curr proposal does not replay WAL records it only searches thru them to locate the last checkpoint. Tom also said "Well, the point of checkpoints is that WAL data before the last one should no longer matter anymore, isn't it?". WAL Files prior to the one containing the last checkpoint can be discarded, but checkpoints themselves do nothing to invalidate the WAL records before the checkpoint. Tom's point that a corruption in the WAL file could prevent pg_rewind from locating a record is valid, but misses something. If the WAL file contained corruption AFTER the last checkpoint this would already in the current code prevent pg_rewind from moving backwards to find the last checkpoint. So whether you go forwards or backwards it is possible to be blocked by corrupt WAL records. Most checkpoints are far enough apart that there is only one checkpoint record in any WAL file and it could be anywhere in the file, so the expected number of records to be searched by forwards or backwards scans is equal. And the probability that corruption occurs before or after the record is also equal. Given the above points, I don't see any reason to believe that the forwards search mechanism proposed is faulty, decreases robustness or is even slower. Overall, the loss of links between WAL records is not a loss of robustness. XLOG_SWITCH is a costly operation and mostly unusable these days, so shouldn't be a reason to shy away from this patch. We don't jump around in WAL in any other way, so sequential records are just fine. So IMHO the proposed approach is still very much viable. I know the approach is new and surprising but I thought about it a lot before proposing it and I couldn't see any holes; still can't. Please give this some thought so we can get comfortable with this idea and increase performance as a result. Thanks. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services