Hi, On Thu, Aug 23, 2018 at 07:02:27PM -0400, Viktor Dukhovni wrote: > Absent any indication of character set from the client, there was > no way to know what encoding any particular non-ASCII octet > string may be using, so the code was optimized to avoid spurious > database string conversion errors, by using an encoding that > would accept any octet-string, garbage-in -> garbage-out.
Unless I misunderstand you (and I might well be doing so) I don't think LATIN1 works that way in Postgres. The backend's encoding and the front end's need to be translation-compatible, but that would be true of a SQL_ASCII back end with literally any front end encoding (no conversion is specified to a back end in SQL_ASCII). If you're going to get multibyte strings, I don't see how LATIN1 is any less likely to throw errors than UTF8 (and it is slightly more likely, if you have a UTF-8 multi-octet code point that maps into ISO 8859-1 space in the wrong way, or if the target environment fails to use ISO 8859-1. It's true that the characters in that case will probably map, but they'll fail anyway since the match could as easily be wrong as right). > This means that we'd a way to dynamically update the client > encoding of the database connection to UTF8 when appropriate > and revert it LATIN1 when the client encoding is unspecified. You can do this in libpq and also in commands passed on a regular connection: SET CLIENT_ENCODING TO 'value'; (see https://www.postgresql.org/docs/10/static/multibyte.html). I forget whether this can be done inside a transaction, but it seems a fabulously bad idea to change encodings mid-transaction anyway. > And this needs to work across the proxymap protocol. The problem could well be here; I'm insufficiently familiar with its internals to comment. Best regards, A -- Andrew Sullivan a...@anvilwalrusden.com