Hi,

On Thu, Aug 23, 2018 at 07:02:27PM -0400, Viktor Dukhovni wrote:
 
> Absent any indication of character set from the client, there was
> no way to know what encoding any particular non-ASCII octet
> string may be using, so the code was optimized to avoid spurious
> database string conversion errors, by using an encoding that
> would accept any octet-string, garbage-in -> garbage-out.

Unless I misunderstand you (and I might well be doing so) I don't
think LATIN1 works that way in Postgres.  The backend's encoding and
the front end's need to be translation-compatible, but that would be
true of a SQL_ASCII back end with literally any front end encoding (no
conversion is specified to a back end in SQL_ASCII).  If you're going
to get multibyte strings, I don't see how LATIN1 is any less likely to
throw errors than UTF8 (and it is slightly more likely, if you have a
UTF-8 multi-octet code point that maps into ISO 8859-1 space in the
wrong way, or if the target environment fails to use ISO 8859-1.  It's
true that the characters in that case will probably map, but they'll
fail anyway since the match could as easily be wrong as right).

> This means that we'd a way to dynamically update the client
> encoding of the database connection to UTF8 when appropriate
> and revert it LATIN1 when the client encoding is unspecified.

You can do this in libpq and also in commands passed on a regular connection:

    SET CLIENT_ENCODING TO 'value';

(see https://www.postgresql.org/docs/10/static/multibyte.html).  I
forget whether this can be done inside a transaction, but it seems a
fabulously bad idea to change encodings mid-transaction anyway.

> And this needs to work across the proxymap protocol.

The problem could well be here; I'm insufficiently familiar with its
internals to comment.

Best regards,

A

-- 
Andrew Sullivan
a...@anvilwalrusden.com

Reply via email to