Re: [GENERAL] encoding of PostgreSQL messages

Karsten Hilbert Wed, 31 Dec 2008 08:58:13 -0800

> Karsten Hilbert <karsten.hilb...@gmx.net> writes:
> > On Mon, Dec 29, 2008 at 09:07:14AM -0300, Alvaro Herrera wrote:
> >> And I'm now wondering if we should delay initializing the translation
> >> stuff until after client_encoding has been reported.
> 
> > Or else
> 
> > - just don't pass those messages through gettext so they are
> >   always in 7 bit ASCII English
> 
> What's the difference?  The user-visible result would be the same
> AFAICS.  (One or the other might be less messy internally, but I'm
> not sure which offhand.)


That was the reason for the suggestion: perhaps less messy and surely lower 
impact on the existing
code as it would not mean moving code later in the initialization but rather 
just removing the
gettext wrappers around a few strings. No difference in the result.

The difference to my other suggestion (no translation vs. translation but then 
replacing
characters > 127 by, say '?' or a space) is:

I could *assume* a given encoding, namely 7 bit ASCII. Or rather I could assume
that I can display the message as "something pretty similar to what the 
original message said,
perhaps without umlauts and accents but still recognizable in the local 
language".

Now, surely, I could dig down the layers to where "my application space" 
receives the message
from PostgreSQL and filter there. It is, however, good to have some knowledge 
of the encoding
where knowledge can be had.

The concrete problem is this: I connect to PostgreSQL from Python. Let's assume 
PG is set to German.
If the wrong password is supplied the PG error message string contains an 
umlaut. This is passed to
libpq, which in turn passes it to the C part of psycopg2 which then turns this 
into an exception. An
exception, by default in Python, is printed to the console, which may be in any 
encoding incompatible
with the latin1 the PG message happens to be in. Thus, printing the PG message 
may or may not fail
due to Unicode de-/encoding errors.

The solution is to find the right layer to take control of the encoding but 
this is eventually only possible
if the encoding is *known*. Thus the plea for "7-bit-ascii English by default 
until the encoding *can* be
known". Going to "7-bit-ascii filter of the original by default until the 
encoding can be known" only
tries to preserve a bit more of the original language. I may be wrong in 
feasibility.

Thanks for considering,
Karsten
-- 
Sensationsangebot verlängert: GMX FreeDSL - Telefonanschluss + DSL 
für nur 16,37 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K1308T4569a

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] encoding of PostgreSQL messages

Reply via email to