Re: [HACKERS] invalidly encoded strings

Jeff Davis Mon, 10 Sep 2007 19:43:32 -0700

On Tue, 2007-09-11 at 11:27 +0900, Tatsuo Ishii wrote:
> > BTW, it strikes me that there is another hole that we need to plug in
> > this area, and that's the convert() function.  Being able to create
> > a value of type text that is not in the database encoding is simply
> > broken.  Perhaps we could make it work on bytea instead (providing
> > a cast from text to bytea but not vice versa), or maybe we should just
> > forbid the whole thing if the database encoding isn't SQL_ASCII.
> 
> Please don't do that. It will break an usefull use case of convert().
> 
> A user has a database encoded in UTF-8. He has English, French,
> Chinese  and Japanese data in tables. To sort the tables in the
> language order, he will do like this:
> 
> SELECT * FROM japanese_table ORDER BY convert(japanese_text using 
> utf8_to_euc_jp);
> 
> Without using convert(), he will get random order of data. This is
> because Kanji characters are in random order in UTF-8, while Kanji
> characters are reasonably ordered in EUC_JP.


Isn't the collation a locale issue, not an encoding issue? Is there a
ja_JP.UTF-8 that defines the proper order?

Regards,
        Jeff Davis


---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq

Re: [HACKERS] invalidly encoded strings

Reply via email to