On Thu, Jan 23, 2020 at 02:23:14PM -0300, Alvaro Herrera wrote: > On 2020-Jan-23, Robert Haas wrote: > > > No, that's not it. Suppose that Álvaro Herrera has some custom > > settings he likes to put on all the PostgreSQL clusters that he uses, > > so he creates a file álvaro.conf and uses an "include" directive in > > postgresql.conf to suck in those settings. If he also likes UTF-8, > > then the file name will be stored in the file system as a 12-byte > > value of which the first two bytes will be 0xc3 0xa1. In that case, > > everything will be fine, because JSON is supposed to always be UTF-8, > > and the file name is UTF-8, and it's all good. But suppose he instead > > likes LATIN-1. > > I do have files with Latin-1-encoded names in my filesystem, even though > my system is UTF-8, so I understand the problem. I was wondering if it > would work to encode any non-UTF8-valid name using something like > base64; the encoded name will be plain ASCII and can be put in the > manifest, probably using a different field of the JSON object -- so for > a normal file you'd have { path => '1234/2345' } but for a > Latin-1-encoded file you'd have { path_base64 => '4Wx2YXJvLmNvbmYK' }. > Then it's the job of the tool to ensure it decodes the name to its > original form when creating/querying for the file. > > A problem I have with this idea is that this is very corner-casey, so > most tool implementors will never realize that there's a need to decode > certain file names.
Another idea is to use base64 for all non-ASCII file names, so we don't need to check if the file name is valid UTF8 before outputting --- we just need to check for non-ASCII, which is much easier. Another problem, though, is how do you _flag_ file names as being base64-encoded? Use another JSON field to specify that? -- Bruce Momjian <br...@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +