Re: [HACKERS] Bug in UTF8-Validation Code?

Tatsuo Ishii Wed, 04 Apr 2007 14:54:46 -0700

> Tatsuo Ishii wrote:
> 
> > <SNIP>. I think we need to continute design discussion, probably
> > targetting for 8.4, not 8.3.
> 
> The discussion came about because Andrew - Supernews noticed that chr() 
> returns invalid utf8, and we're trying to fix all the bugs with invalid 
> utf8 in the system.  Something needs to be done, even if we just check 
> the result of the current chr() implementation and throw an error on 
> invalid results.  But do we want to make this minor change for 8.3 and 
> then change it again for 8.4?


My opinion was in the snipped part by you in the previous mail -- 
Limiting chr() to ASCII range
--
Tatsuo Ishii
SRA OSS, Inc. Japan

> Here's an example of the current problem.  It's an 8.2.3 database with 
> utf8.en_US encoding
> 
> 
> mark=# create table testutf8 (t text);
> CREATE TABLE
> mark=# insert into testutf8 (t) (select chr(gs) from 
> generate_series(0,255) as gs);
> INSERT 0 256
> mark=# \copy testutf8 to testutf8.data
> mark=# truncate testutf8;
> TRUNCATE TABLE
> mark=# \copy testutf8 from testutf8.data
> ERROR:  invalid byte sequence for encoding "UTF8": 0x80
> HINT:  This error can also happen if the byte sequence does not match 
> the encoding expected by the server, which is controlled by 
> "client_encoding".
> CONTEXT:  COPY testutf8, line 129
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Re: [HACKERS] Bug in UTF8-Validation Code?

Reply via email to