I would like to convert the Koha source tree to UTF-8. Having everything in a systematic, modern encoding would be beneficial for everyone, I think.
I wrote a script to find files that are not in UTF-8 (attached for review); it uses the isutf8 tool from Joey Hess's moreutils package (see http://kitenet.net/~joey/code/moreutils/). The script excludes a number of files based on the suffix, to avoid confusing things by reporting binary files, etc. It currently reports 59 files for me. Most are just copyright symbols or names in release notes, and all of those seem to be in the ISO-8859-1 (Latin-1) character set, so converting them is easy. I did the actual conversion with the "iconv -f ISO-8859-1 -t UTF-8" command. The following three files puzzle me, however: C4/tests/testrecords/marc21_marc8_combining_chars.dat etc/zebradb/etc/urx.chr etc/zebradb/lang_defs/en/sort-string-utf.chr Is it acceptable to conver them to UTF-8, or should they remain as they are? I don't know how they are used.
find-nonutf8
Description: application/shellscript
_______________________________________________ Koha-devel mailing list Koha-devel@lists.koha.org http://lists.koha.org/mailman/listinfo/koha-devel