i18n and kde-ish support

Josh Miller Thu, 01 Dec 2005 01:44:11 -0800

I've been looking around at various i18n and l10n frameworks (in perl, and
other languages) because I'm probably going to be undertaking a
translation of an app written in perl.


I'm just getting my feet wet, but so far, I like the way kde has handled
it the most. They've also come up with a good method to get translators
working on the translations, including KBabel to aid in all of that, and
it's all very well documented.

I was wondering if anyone has done any work on building a library for
perl that could work with their language files (standard .pot/.po/.mo
type files, with some extra info encoded to make special cases work
smoothly).
(see below for a brief overview of my understanding of how it works, or
this url, which gives a good deal of info about their translation process:
http://developer.kde.org/documentation/library/kdeqt/kde3arch/kde-i18n-howto.html)

I'm also looking for any suggestions people may have on recommended
frameworks to use with perl, and why?

Lastly, if support for language files similar to the ones kde uses are not
available under perl at this time, I may make a module for that (or extend
an existing framework). Any suggestions on where that should live on CPAN?
Locale::Maketext::KDE maybe?



How kde's i18n support works (briefly, and in a perlish way):
(please forgive me, and correct me, if I get anything horibly wrong)

The method "i18n()" can be called in one of three ways, and always returns
the translated text (or the default, if no translation is found).
    $trans = i18n("some string");
    $trans = i18n("context", "some string"); # eg i18n("verb, to view","View");
    $trans = i18n("one something", "%n somethings", $count);

There's some magic in the way the context and plural examples work.

".po" files (portable objects) are basically just a big key value pair
thing, and they get compiled into .mo files for faster lookups.

For the first example, the .po entry would be:
    msgid "some string"
    msgstr "some string translated"

Lookups on stuff with context get a prefix automatically prepended to the
context part in the .po files, so that translators know not to translate
that part, and the library knows which part to use as the fallback text
if no translation is available. Ex:
    msgid ""
    "_: verb, to view"
    "View"
    msgstr ""
    "View translated"
(Note: the translation doesn't include the "_: ", so no post processing
regexps and such are needed)

Plurals are where I think they did a wonderful job.
Each language file has a header, with a key of "". eg.
    msgid ""
    mststr "Header-Field: value\n"
    "Header-Field2: value2\n"
That can have a "Plural-Forms: " entry, which is used to determine
the plural forms for that language. For Russian, it's:
 "Plural-Forms:  nplurals=3; plural=(n%10==1 && n%100!=11 ? 0 : n%10>"
 "=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2);\n"

There's also a "pluralType" that is pulled from another special entry,
and defines the languages' plural type of one of NoPlural, TwoForms,
French, Russian, etc (there are 15 of them).

Then, the "msgid" is flagged with a prefix of "_n: ", and the translator
can provide however many translations are neccessary. Eg. for Russian:
    msgid ""
    "_n: Applied: Changes made to %n line undone\n"
    "Applied: Changes made to  %n lines undone"
    msgstr ""
    "[EMAIL PROTECTED]: ?~X???????????????~O %n [EMAIL PROTECTED] 
???~B???????????~K\n"
    "[EMAIL PROTECTED]: ?~X???????????????~O %n [EMAIL PROTECTED] 
???~B???????????~K\n"
    "[EMAIL PROTECTED]: ?~X???????????????~O %n [EMAIL PROTECTED] 
???~B???????????~K"

If multiple numbers are included in the message, %1 and %2 can be used
so that they may be re-ordered.


As far as I've been able to figure, it solves the nasty problems people
have mentioned about GNU gettext, and can still use the same standard
format of files, so existing tools work with them.


So... thoughts? suggestions?
--
Josh I.

i18n and kde-ish support

Reply via email to