In the last episode (Jun 30), Pooly said: > 2008/6/30 Dan Nelson <[EMAIL PROTECTED]>: > > In the last episode (Jun 29), Pooly said: > >> Hi, > >> > >> I'm trying to convert my tables to UTF8 but I'm getting the > >> following error: ERROR 1062 (23000): Duplicate entry 'Zorglüb' for > >> key 1 > >> > >> Not too sure why I'm getting this error since the current (latin1) > >> data are: > >> > >> mysql> select * from topics_lookup where label like 'Zor%'; > >> +----------+----------+------+ > >> | label | topic_id | main | > >> +----------+----------+------+ > >> | Zorglub | 72 | 0 | > >> | Zorglüb | 72 | 1 | > >> +----------+----------+------+ > >> 2 rows in set (0.00 sec) > >> > >> There is a unique index on label, however the 2 data are different. > >> > >> Any ideas ? > > > > I can't reproduce this. Can you provide example commands > > demonstrating your problem? > > Yes, sorry I should have been more precise in my email. > > mysql> select version(); > +--------------------------+ > | version() | > +--------------------------+ > | 5.0.32-Debian_7etch5-log | > +--------------------------+ > 1 row in set (0.00 sec) > > create table mytable2 ( label varchar(200) primary key ) charset latin1; > insert into mytable2 values ('Zorglub'), ('Zorglüb'); > alter table mytable2 convert to character set utf8 collate utf8_general_ci; > > this gives: > ERROR 1062 (23000): Duplicate entry 'Zorglüb' for key 1 > > I tried to search the changelog and the bug tracking system, but > without much luck.
Mysql's default collation is latin1_swedish_ci, which sorts ü along with y. utf8_general_ci sorts it along with u: http://www.collation-charts.org/mysql60/mysql604.latin1_swedish_ci.html http://www.collation-charts.org/mysql60/mysql604.utf8_general_ci.european.html More reading: http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-sets.html ... To further illustrate, the following equalities hold in both utf8_general_ci and utf8_unicode_ci (for the effect this has in comparisons or when doing searches, see Section 9.1.5.6, "Examples of the Effect of Collation"): Ä = A Ö = O Ü = U http://dev.mysql.com/doc/refman/5.0/en/charset-collation-effect.html mysql> SELECT * FROM germanutf8 WHERE c = 'Bär'; +------+ | c | +------+ | Bar | | Bär | +------+ ... This is not a bug but rather a consequence of the sorting that latin1_german1_ci or utf8_unicode_ci do (the sorting shown is done according to the German DIN 5007 standard). -- Dan Nelson [EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]