[Koha-devel] Converting Koha sources to UTF-8

Lars Wirzenius Wed, 24 Mar 2010 19:38:49 -0700

I would like to convert the Koha source tree to UTF-8. Having everything
in a systematic, modern encoding would be beneficial for everyone, I
think.


I wrote a script to find files that are not in UTF-8 (attached for
review); it uses the isutf8 tool from Joey Hess's moreutils package (see
http://kitenet.net/~joey/code/moreutils/).

The script excludes a number of files based on the suffix, to avoid
confusing things by reporting binary files, etc.

It currently reports 59 files for me. Most are just copyright symbols or
names in release notes, and all of those seem to be in the ISO-8859-1
(Latin-1) character set, so converting them is easy. I did the actual
conversion with the "iconv -f ISO-8859-1 -t UTF-8" command.

The following three files puzzle me, however:

C4/tests/testrecords/marc21_marc8_combining_chars.dat
etc/zebradb/etc/urx.chr
etc/zebradb/lang_defs/en/sort-string-utf.chr

Is it acceptable to conver them to UTF-8, or should they remain as they
are? I don't know how they are used.

find-nonutf8
Description: application/shellscript

_______________________________________________
Koha-devel mailing list
Koha-devel@lists.koha.org
http://lists.koha.org/mailman/listinfo/koha-devel

[Koha-devel] Converting Koha sources to UTF-8

Reply via email to