On 27/10/06, Thomas H. <[EMAIL PROTECTED]> wrote:
FYI, prior to 8.2, there is another source of bad UTF8 byte sequences:
when using tsearch2 on utf8 content in <8.2, tsearch2 was generating bad
utf8 sequences. as tsearch2 does lowercase each char in the text its
indexing, it did also do so with multibyte-characters... unfortunately
taking each byte separately, so it seems. the unicode-representation of
german umlauts (äöü) are some examples of charcodes, that where turned into
invalid sequences.
this data could be successfully pg_dump'ed, but not pg_restore'd. in 8.2,
this looks fixed. to upgrade from 8.1.5 to 8.2b1 we had to remove all
tsearch2 index data, dump the db, restore the db in 8.2 and recreate the
indices.
You need to initdb with utf8 and then install tsearch2 with utf8. Both
need utf8. I had a similar problem. Perhaps your 8.1 postgres cluster
wasn't utf8?
- thomas
----- Original Message -----
From: "Jeff Davis" <[EMAIL PROTECTED]>
To: <pgsql-bugs@postgresql.org>
Sent: Saturday, October 28, 2006 12:38 AM
Subject: Re: [BUGS] COPY fails on 8.1 with invalid byte sequences in text
> On Fri, 2006-10-27 at 14:42 -0700, Jeff Davis wrote:
>> It seems to be essentially a data corruption issue if applications
>> insert binary data in text fields using escape sequences. Shouldn't
>> PostgreSQL reject an invalid UTF8 sequence in any text type?
>>
>
> Another note: PostgreSQL rejects invalid UTF8 sequences in other
> contexts. For instance, if you use PQexecParams() and insert using type
> text and any format (text or binary), it will reject invalid sequences.
> It will of course allow anything to be sent when the type is bytea.
>
> Also, I thought I'd publish the workaround that I'm using.
>
> I created a function that seems to work for validating text data as
> being valid UTF8.
>
> CREATE OR REPLACE FUNCTION valid_utf8(TEXT) returns BOOLEAN
> LANGUAGE plperlu AS
> $valid_utf8$
> use utf8;
> return utf8::decode($_[0]) ? 1 : 0;
> $valid_utf8$;
>
> I just add a check constraint on all of my text attributes in all of my
> tables. Not fun, but it works.
>
> Regards,
> Jeff Davis
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
http://archives.postgresql.org
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend