On 2020-01-23 18:04, Robert Haas wrote:
Now, you might say "well, why don't we just do an encoding
conversion?", but we can't. When the filesystem tells us what the file
names are, it does not tell us what encoding the person who created
those files had in mind. We don't know that they had*any*  encoding in
mind. IIUC, a file in the data directory can have a name that consists
of any sequence of bytes whatsoever, so long as it doesn't contain
prohibited characters like a path separator or \0 byte. But only some
of those possible octet sequences can be stored in a manifest that has
to be valid UTF-8.

I think it wouldn't be unreasonable to require that file names in the database directory be consistently encoded (as defined by pg_control, probably). After all, this information is sometimes also shown in system views, so it's already difficult to process total junk. In practice, this shouldn't be an onerous requirement.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply via email to