Play time is over and I am now building a couple of servers using Ubuntu 8.04 
and the latest git version of Koha. So far the installations have been clean 
and easy. I have just one more UTF-8 encoding question.

 

Being in the southwestern U.S. we are trying to deal with some titles that have 
Spanish character sets to go along with our primarily English character sets. 
In looking over Koha Wiki entry for encoding and character sets, 
http://wiki.koha.org/doku.php?id=encodingscratchpad., I have a question about 
the section on combining characters and collations. The search entry may not 
necessarily have the special character, but we need to return records with 
special characters. For those who have had to deal with this, which would you 
recommend, utf8_unicode_ci or utf8_general_ci for the collation collection?

 

The word Univerzalitás is a unicode combining form. When you copy/paste it 
into a text editor or use a keyboard to type it, it is most likely going to be 
the non-combining form: Univerzalitás. (in the non-combining form, the hex for 
the accented a is: Hex 0301; for the non-combining form it’s: Hex 61, Hex 
00e1). 

Non-combining form: http://www.fileformat.info/info/unicode/char/00e1/index.htm 

Combining form: http://www.fileformat.info/info/unicode/char/61/index.htm 
http://www.fileformat.info/info/unicode/char/0301/index.htm 

Univerzalitás Univerzalitás 

It seems that the utf8_general_ci collation doesn’t support equality for those 
two forms. However, utf8_unicode_ci seems to work. If you have combining 
characters in your data, you may want to go with statements like: 

ALTER TABLE marc_word MODIFY word VARCHAR(255) CHARACTER SET utf8 COLLATE 
utf8_unicode_ci; 

and be sure to add init-connect = ‘SET collation_connection = utf8_unicode_ci’ 
to your my.cnf 

 

Thanks,

 

John

 

+----------------------------------------------------------------------------+

John Chadwick, Ed.D. Information Technology Manager

New Mexico State Library

1209 Camino Carlos Rey

Santa Fe, NM 87507

Phone: 505-476-9740  Cell: 505-629-8116 Fax: 505-476-9761

john.chadw...@state.nm.us

http://www.nmstatelibrary.org

 



Confidentiality Notice: This e-mail, including all attachments is for the sole 
use of the intended recipient(s) and may contain confidential and privileged 
information. Any unauthorized review, use, disclosure or distribution is 
prohibited unless specifically provided under the New Mexico Inspection of 
Public Records Act. If you are not the intended recipient, please contact the 
sender and destroy all copies of this message. -- This email has been scanned 
by the Sybari - Antigen Email System. 



_______________________________________________
Koha-devel mailing list
Koha-devel@lists.koha.org
http://lists.koha.org/mailman/listinfo/koha-devel

Reply via email to