On 1/24/20 9:27 AM, Tom Lane wrote:
Peter Eisentraut <peter.eisentr...@2ndquadrant.com> writes:
On 2020-01-23 18:04, Robert Haas wrote:
Now, you might say "well, why don't we just do an encoding
conversion?", but we can't. When the filesystem tells us what the file
names are, it does not tell us what encoding the person who created
those files had in mind. We don't know that they had*any* encoding in
mind. IIUC, a file in the data directory can have a name that consists
of any sequence of bytes whatsoever, so long as it doesn't contain
prohibited characters like a path separator or \0 byte. But only some
of those possible octet sequences can be stored in a manifest that has
to be valid UTF-8.
I think it wouldn't be unreasonable to require that file names in the
database directory be consistently encoded (as defined by pg_control,
probably). After all, this information is sometimes also shown in
system views, so it's already difficult to process total junk. In
practice, this shouldn't be an onerous requirement.
I don't entirely follow why we're discussing this at all, if the
requirement is backing up a PG data directory. There are not, and
are never likely to be, any legitimate files with non-ASCII names
in that context. Why can't we just skip any such files?
It's not uncommon in my experience for users to drop odd files into
PGDATA (usually versioned copies of postgresql.conf, etc.), but I agree
that it should be discouraged. Even so, I don't recall ever seeing any
non-ASCII filenames.
Skipping files sounds scary, I'd prefer an error or a warning (and then
base64 encode the filename).
Regards,
--
-David
da...@pgmasters.net