Thanks all for the information. Summary is:
- 8.0 wasn't very strict, and allowed the illegal values in, instead
of mapping them over into UTF-8 space
- the values can be stripped with iconv -c
- 8.2 should be more strict
I'm in the midst of my upgrade to 8.2 now, hopefully the LATIN1->UTF8
On May 17, 2007, at 16:47 , PFC wrote:
and put that in the form. Instead of being mapped to 2-byte UTF8
high-bit equivalents, they are going into the database directly as
one-byte values > 127. That is, as illegal UTF8 values.
Sometimes you also get HTML entities in the mix. Who kn
I have a small database (PgSQL 8.0, database encoding UTF8) that folks
are inserting into via a web form. The form itself is declared
ISO-8859-1 and the prior to inserting any data, pg_client_encoding is
set to LATIN1.
Wouldn't it be simpler to have the browser submit the form in
Paul Ramsey wrote:
> I have a small database (PgSQL 8.0, database encoding UTF8) that folks
> are inserting into via a web form. The form itself is declared
> ISO-8859-1 and the prior to inserting any data, pg_client_encoding is
> set to LATIN1.
>
> Most of the high-bit characters are correctly tr
I have a small database (PgSQL 8.0, database encoding UTF8) that folks
are inserting into via a web form. The form itself is declared
ISO-8859-1 and the prior to inserting any data, pg_client_encoding is
set to LATIN1.
Most of the high-bit characters are correctly translated from LATIN1 to