Play time is over and I am now building a couple of servers using Ubuntu 8.04 and the latest git version of Koha. So far the installations have been clean and easy. I have just one more UTF-8 encoding question.
Being in the southwestern U.S. we are trying to deal with some titles that have Spanish character sets to go along with our primarily English character sets. In looking over Koha Wiki entry for encoding and character sets, http://wiki.koha.org/doku.php?id=encodingscratchpad., I have a question about the section on combining characters and collations. The search entry may not necessarily have the special character, but we need to return records with special characters. For those who have had to deal with this, which would you recommend, utf8_unicode_ci or utf8_general_ci for the collation collection? The word Univerzalitás is a unicode combining form. When you copy/paste it into a text editor or use a keyboard to type it, it is most likely going to be the non-combining form: Univerzalitás. (in the non-combining form, the hex for the accented a is: Hex 0301; for the non-combining form it’s: Hex 61, Hex 00e1). Non-combining form: http://www.fileformat.info/info/unicode/char/00e1/index.htm Combining form: http://www.fileformat.info/info/unicode/char/61/index.htm http://www.fileformat.info/info/unicode/char/0301/index.htm Univerzalitás Univerzalitás It seems that the utf8_general_ci collation doesn’t support equality for those two forms. However, utf8_unicode_ci seems to work. If you have combining characters in your data, you may want to go with statements like: ALTER TABLE marc_word MODIFY word VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci; and be sure to add init-connect = ‘SET collation_connection = utf8_unicode_ci’ to your my.cnf Thanks, John +----------------------------------------------------------------------------+ John Chadwick, Ed.D. Information Technology Manager New Mexico State Library 1209 Camino Carlos Rey Santa Fe, NM 87507 Phone: 505-476-9740 Cell: 505-629-8116 Fax: 505-476-9761 john.chadw...@state.nm.us http://www.nmstatelibrary.org Confidentiality Notice: This e-mail, including all attachments is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited unless specifically provided under the New Mexico Inspection of Public Records Act. If you are not the intended recipient, please contact the sender and destroy all copies of this message. -- This email has been scanned by the Sybari - Antigen Email System.
_______________________________________________ Koha-devel mailing list Koha-devel@lists.koha.org http://lists.koha.org/mailman/listinfo/koha-devel