On 1/12/2009 7:36 PM, DM wrote:

select convert('Vall\xe9e','LATIN1','UTF8');

You expect the result 'Vallée', right? As in the utf-8 bytes

  Vall\xc3\xa9e

?

ERROR:  invalid byte sequence for encoding "UTF8": 0xe965
HINT: This error can also happen if the byte sequence does not match the
encoding expected by the server, which is controlled by "client_encoding".

The HINT tells you what's going on here.

The error isn't being raised by convert(...). You'll get the same error if you just:

    select 'Vall\xe9e';

... because the problem is that the server is trying to treat that literal as a string in your client_encoding, which is UTF-8, and convert it to the server encoding.

You will note that the convert(...) function takes a 'bytea' argument rather than 'text'. That's because PostgreSQL string literals are expected to be valid text in the current encoding. Entering a string literal with a byte sequence invalid in the current encoding will be problematic. You need to supply a valid bytea value, which involves juggling some interesting double-escaping and using octal escapes for characters.

The octal escape for the byte 0xe9 is \351, so to correctly submit your query to PostgreSQL you'd need to send:

  select convert(E'Vall\\351e','LATIN1','UTF8');

or if you really just want to convert to the server encoding:

  select convert_from(E'Vall\\351e','LATIN-1');

Here, E'' is used for the string to tell PostgreSQL to interpret escapes, and \\351 is used as the octal escape for 0xe9. We double the backslashes because the first backslash gets used up parsing the literal, and we want the bytea value to actually be 'Val\351e' after string literal parsing.

Instead, if standard_conforming_strings is on, you can omit the E'' and use single backslashes.

Note that in 8.5 and above you could use E'Val\\xe9e' instead of the octal form, as 8.5 understands hex escapes in bytea values.

I guess bytea is kind of ugly to use in a text-based protocol :-( because the query string has to be valid text in the client_encoding, so bytea values containing data not valid in the client encoding must be escaped in the parsed query string. There's nothing much to be done about it, though.

--
Craig Ringer

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Reply via email to