On 3/27/20 3:29 PM, Robert Haas wrote:
On Fri, Mar 27, 2020 at 11:26 AM Stephen Frost <sfr...@snowman.net> wrote:
Seems better to (later?) add support for generating manifests for WAL
files, and then have a tool that can verify all the manifests required
to restore a base backup.
I'm not trying to expand on the feature set here or move the goalposts
way down the road, which is what seems to be what's being suggested
here. To be clear, I don't have any objection to adding a generic tool
for validating WAL as you're talking about here, but I also don't think
that's required for pg_validatebackup. What I do think we need is a
check of the WAL that's fetched when people use pg_basebackup -Xstream
or -Xfetch. pg_basebackup itself has that check because it's critical
to the backup being successful and valid. Not having that basic
validation of a backup really just isn't ok- there's a reason
pg_basebackup has that check.
I don't understand how this could be done without significantly
complicating the architecture. As I said before, -Xstream sends WAL
over a separate connection that is unrelated to the one running
BASE_BACKUP, so the base-backup connection doesn't know what to
include in the manifest. Now you could do something like: once all of
the WAL files have been fetched, the client checksums all of those and
sends their names and checksums to the server, which turns around and
puts them into the manifest, which it then sends back to the client.
But that is actually quite a bit of additional complexity, and it's
pretty strange, too, because now you have the client checksumming some
files and the server checksumming others. I know you mentioned a few
different ideas before, but I think they all kinda have some problem
along these lines.
I also kinda disagree with the idea that the WAL should be considered
an integral part of the backup. I don't know how pgbackrest does
things,
We checksum each WAL file while it is read and transmitted to the repo
by the archive_command. Then at the end of the backup we ensure that
all the WAL required to make the backup consistent has made it to the repo.
but BART stores each backup in a separate directly without any
associated WAL, and then keeps all the WAL together in a different
directory. I imagine that people who are using continuous archiving
also tend to use -Xnone, or if they do backups by copying the files
rather than using pg_backrest, they exclude pg_wal. In fact, for
people with big, important databases, I'd assume that would be the
normal pattern. You presumably wouldn't want to keep one copy of the
WAL files taken during the backup with the backup itself, and a
separate copy in the archive.
pgBackRest does provide the option to copy WAL into the backup directory
for the super-paranoid, though it is not the default. It is pretty handy
for moving individual backups some other medium like tape, though.
If -Xnone is specified then it seems like pg_validatebackup is
completely off the hook. But in the case of -Xstream or -Xfetch
couldn't we at least verify that the expected WAL segments are present
and the correct size?
Storing the start/stop lsn in the manifest would be a nice thing to have
anyway and that would make this feature pretty trivial. Yeah, that's in
the backup_label file as well but the manifest is so much easier to read.
Regards,
--
-David
da...@pgmasters.net