On 2020-01-23 18:04, Robert Haas wrote:
Now, you might say "well, why don't we just do an encoding conversion?", but we can't. When the filesystem tells us what the file names are, it does not tell us what encoding the person who created those files had in mind. We don't know that they had*any* encoding in mind. IIUC, a file in the data directory can have a name that consists of any sequence of bytes whatsoever, so long as it doesn't contain prohibited characters like a path separator or \0 byte. But only some of those possible octet sequences can be stored in a manifest that has to be valid UTF-8.
I think it wouldn't be unreasonable to require that file names in the database directory be consistently encoded (as defined by pg_control, probably). After all, this information is sometimes also shown in system views, so it's already difficult to process total junk. In practice, this shouldn't be an onerous requirement.
-- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services