> On Jan 24, 2020, at 8:36 AM, David Steele <da...@pgmasters.net> wrote:
> 
>> I don't entirely follow why we're discussing this at all, if the
>> requirement is backing up a PG data directory.  There are not, and
>> are never likely to be, any legitimate files with non-ASCII names
>> in that context.  Why can't we just skip any such files?
> 
> It's not uncommon in my experience for users to drop odd files into PGDATA 
> (usually versioned copies of postgresql.conf, etc.), but I agree that it 
> should be discouraged.  Even so, I don't recall ever seeing any non-ASCII 
> filenames.
> 
> Skipping files sounds scary, I'd prefer an error or a warning (and then 
> base64 encode the filename).

I tend to agree with Tom.  We know that postgres doesn’t write any such files 
now, and if we ever decided to change that, we could change this, too.  So for 
now, we can assume any such files are not ours.  Either the user manually 
scribbled in this directory, or had a tool (antivirus checksum file, vim 
.WHATEVER.swp file, etc) that did so.  Raising an error would break any 
automated backup process that hit this issue, and base64 encoding the file name 
and backing up the file contents could grab data that the user would not 
reasonably expect in the backup.  But this argument applies equally well to 
such files regardless of filename encoding.  It would be odd to back them up 
when they happen to be valid UTF-8/ASCII/whatever, but not do so when they are 
not valid.  I would expect, therefore, that we only back up files which match 
our expected file name pattern and ignore (perhaps with a warning) everything 
else.

Quoting from Robert’s email about why we want a backup manifest seems to 
support this idea, at least as I see it:

> So, let's suppose we invent a backup manifest. What should it contain?
> I imagine that it would consist of a list of files, and the lengths of
> those files, and a checksum for each file. I think you should have a
> choice of what kind of checksums to use, because algorithms that used
> to seem like good choices (e.g. MD5) no longer do; this trend can
> probably be expected to continue. Even if we initially support only
> one kind of checksum -- presumably SHA-something since we have code
> for that already for SCRAM -- I think that it would also be a good
> idea to allow for future changes. And maybe it's best to just allow a
> choice of SHA-224, SHA-256, SHA-384, and SHA-512 right out of the
> gate, so that we can avoid bikeshedding over which one is secure
> enough. I guess we'll still have to argue about the default. I also
> think that it should be possible to build a manifest with no
> checksums, so that one need not pay the overhead of computing
> checksums if one does not wish. Of course, such a manifest is of much
> less utility for checking backup integrity, but you can still check
> that you've got the right files, which is noticeably better than
> nothing.  The manifest should probably also contain a checksum of its
> own contents so that the integrity of the manifest itself can be
> verified. And maybe a few other bits of metadata, but I'm not sure
> exactly what.  Ideas?
> 
> 
> 
> Once we invent the concept of a backup manifest, what do we need to do
> with them? I think we'd want three things initially:
> 
> 
> 
> (1) When taking a backup, have the option (perhaps enabled by default)
> to include a backup manifest.
> (2) Given an existing backup that has not got a manifest, construct one.
> (3) Cross-check a manifest against a backup and complain about extra
> files, missing files, size differences, or checksum mismatches.


Nothing in there sounds to me like it needs to include random cruft.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Reply via email to