On Sun, Feb 10, 2013 at 06:47:30PM -0500, Tom Lane wrote: > Noah Misch <n...@leadboat.com> writes: > > Following some actual testing, I see that we treat postgresql.conf values as > > byte sequences; any reinterpretation as encoded text happens later. Hence, > > contrary to my earlier suspicion, your patch does not make that situation > > worse. The present situation is bad; among other things, current_setting() > > is > > a vector for injecting invalid text data. But unconditionally validating > > postgresql.conf values in the platform encoding would not be an improvement. > > Suppose you have a UTF-8 platform encoding and KOI8R databases. You may > > wish > > to put KOI8R strings in a GUC, say search_path. That's possible today; if > > we > > required that postgresql.conf conform to the platform encoding and no other, > > it would become impossible. This area warrants improvement, but doing so > > will > > entail careful design. > > The key problem, ISTM, is that it's not at all clear what encoding to > expect the incoming data to be in. I'm concerned about trying to fix > that by assuming it's in some "platform encoding" --- for one thing, > while that might be a well-defined concept on Windows, I don't believe > it is anywhere else.
GetPlatformEncoding() imposes a sufficiently-portable definition. I just don't think that definition leads to a value that can be presumed desirable and adequate for postgresql.conf. > If we knew that postgresql.conf was stored in, say, UTF8, then it would > probably be possible to perform encoding conversion to get string > variables into the database encoding. Perhaps we should allow some > magic syntax to tell us the encoding of a config file? > > file_encoding = 'utf8' # must precede any non-ASCII in the file > > There would still be a lot of practical problems to solve, like what to > do if we fail to convert some string into the database encoding. But at > least the problems would be somewhat well-defined. Agreed. That's a promising direction. > While we're thinking about this, it'd be nice to fix our handling (or > rather lack of handling) of encoding considerations for database names, > user names, and passwords. I could imagine adding some sort of encoding > marker to connection request packets, which could fix the don't-know- > the-encoding problem as far as incoming data is concerned. That deserves a TODO entry under Wire Protocol Changes to avoid losing it. > But how > shall we deal with storing the strings in shared catalogs, which have to > be readable from multiple databases possibly of different encodings? I suppose we would pick an encoding sufficient for all values we intend to support (UTF8? MULE_INTERNAL?), then store the data in that encoding using either bytea or a new type, say "omnitext". Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers