Here's v2.

Jakub Wartak pointed out to me off-list that this broke the case where
a chain of incrementals crosses a timeline switch. That made me
realize that I also need to add the WAL level to XLOG_END_OF_RECOVERY,
so this version does that.

I also forgot to mention that this patch changes behavior in the case
where you've been running with summarize_wal=off for a while and then
you turned it on. Previously, we'd start summarizing from the oldest
WAL record we could still read from pg_xlog. Now, we'll start
summarizing from the first checkpoint (or timeline switch) after that.
That's necessary, because when we read the oldest record available, we
can't know for sure what WAL level was used to generate it, so we have
to assume the worst case, i.e. minimal, and thus skip summarizing that
WAL. However, it's also harmless, because a WAL summary that covers
part of a checkpoint cycle is useless to us anyway. We need completely
WAL summaries from the start of the prior backup to the start of the
current one to be able to do an incremental backup, and the previous
backup and the current backup must have each started with a
checkpoint, so a summary covering part of a checkpoint cycle can never
make an incremental backup possible where it would not otherwise have
been possible.

One more thing I forgot to mention is that we can't fix this problem
by making summarize_wal PGC_POSTMASTER. That doesn't work because of
what is mentioned in the previous paragraph: when summarize_wal is
turned on it will go back and try to summarize any older WAL that is
still around: we need this infrastructure to know whether or not that
older WAL is safe to summarize. And I don't think we can remove the
behavior where we back up and try to summarize old WAL, either,
because then after a crash you'd always have a gap in your summary
files and you would have to take a new full backup afterwards, which
would suck. I continue to think that a lot of the value of this
feature is in making sure that it *always* works -- when you start to
add cases where full backups are required, this becomes a lot less
useful to the target audience for the feature, namely, people whose
databases are so large that full backups take an unreasonably long
time to complete.

...Robert

Attachment: v2-0001-Do-not-summarize-WAL-if-generated-with-wal_level-.patch
Description: Binary data

Reply via email to