I've been looking around at various i18n and l10n frameworks (in perl, and other languages) because I'm probably going to be undertaking a translation of an app written in perl.
I'm just getting my feet wet, but so far, I like the way kde has handled it the most. They've also come up with a good method to get translators working on the translations, including KBabel to aid in all of that, and it's all very well documented. I was wondering if anyone has done any work on building a library for perl that could work with their language files (standard .pot/.po/.mo type files, with some extra info encoded to make special cases work smoothly). (see below for a brief overview of my understanding of how it works, or this url, which gives a good deal of info about their translation process: http://developer.kde.org/documentation/library/kdeqt/kde3arch/kde-i18n-howto.html) I'm also looking for any suggestions people may have on recommended frameworks to use with perl, and why? Lastly, if support for language files similar to the ones kde uses are not available under perl at this time, I may make a module for that (or extend an existing framework). Any suggestions on where that should live on CPAN? Locale::Maketext::KDE maybe? How kde's i18n support works (briefly, and in a perlish way): (please forgive me, and correct me, if I get anything horibly wrong) The method "i18n()" can be called in one of three ways, and always returns the translated text (or the default, if no translation is found). $trans = i18n("some string"); $trans = i18n("context", "some string"); # eg i18n("verb, to view","View"); $trans = i18n("one something", "%n somethings", $count); There's some magic in the way the context and plural examples work. ".po" files (portable objects) are basically just a big key value pair thing, and they get compiled into .mo files for faster lookups. For the first example, the .po entry would be: msgid "some string" msgstr "some string translated" Lookups on stuff with context get a prefix automatically prepended to the context part in the .po files, so that translators know not to translate that part, and the library knows which part to use as the fallback text if no translation is available. Ex: msgid "" "_: verb, to view" "View" msgstr "" "View translated" (Note: the translation doesn't include the "_: ", so no post processing regexps and such are needed) Plurals are where I think they did a wonderful job. Each language file has a header, with a key of "". eg. msgid "" mststr "Header-Field: value\n" "Header-Field2: value2\n" That can have a "Plural-Forms: " entry, which is used to determine the plural forms for that language. For Russian, it's: "Plural-Forms: nplurals=3; plural=(n%10==1 && n%100!=11 ? 0 : n%10>" "=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2);\n" There's also a "pluralType" that is pulled from another special entry, and defines the languages' plural type of one of NoPlural, TwoForms, French, Russian, etc (there are 15 of them). Then, the "msgid" is flagged with a prefix of "_n: ", and the translator can provide however many translations are neccessary. Eg. for Russian: msgid "" "_n: Applied: Changes made to %n line undone\n" "Applied: Changes made to %n lines undone" msgstr "" "[EMAIL PROTECTED]: ?~X???????????????~O %n [EMAIL PROTECTED] ???~B???????????~K\n" "[EMAIL PROTECTED]: ?~X???????????????~O %n [EMAIL PROTECTED] ???~B???????????~K\n" "[EMAIL PROTECTED]: ?~X???????????????~O %n [EMAIL PROTECTED] ???~B???????????~K" If multiple numbers are included in the message, %1 and %2 can be used so that they may be re-ordered. As far as I've been able to figure, it solves the nasty problems people have mentioned about GNU gettext, and can still use the same standard format of files, so existing tools work with them. So... thoughts? suggestions? -- Josh I.