-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

After finding out that I have commit access on project's repo I started
to work on fixing the i18n issues I have observed in wormux.

Since I want to educate people into doing the right thing (tm) wrt
i18n, I decided to explain how proper i18n should be done so it is usable.


What I am going to write about is ngettext, which is the tool which you
want to use when you want write strings that should be different depending
on a count number. Also, some nice article, maybe better than my mail is
available and I recommend it[*].

So this applies for something like this:

You won 1 ninja rope!
You won 2 ninja ropes!

These are called in gettext terminology "plural forms" although this is slightly
confusing/incorrect since it involves the form for singular, too. Don't be 
confused :-) !


So here goes:


I have found this "nice" bit in the code:
- -----------8<----------
void BonusBox::ApplyBonus(Team &equipe, Character &ver) {
[..]
    /*this next 'if' should never be true, but I am loath to remove it just in 
case. */
    
if(equipe.ReadNbAmmos(WeaponsList::GetInstance()->GetWeapon(contents)->GetName())!=INFINITE_AMMO)
 {
        equipe.m_nb_ammos[ 
WeaponsList::GetInstance()->GetWeapon(contents)->GetName() ] += nbr_ammo;
        txt << Format(ngettext(
                "%s team has won %u %s!",
                "%s team has won %u %ss!",
                2),
            equipe.GetName().c_str(), nbr_ammo, 
WeaponsList::GetInstance()->GetWeapon(contents)->GetName().c_str());
    }
    else {
        txt << Format(gettext("%s team already has infinite ammo for the %s!"), 
//this should never appear
            equipe.GetName().c_str(), 
WeaponsList::GetInstance()->GetWeapon(contents)->GetName().c_str());
[..]
}
- -----------8<----------


This bit is broken from a i18n POV in about three different ways. That only in 
17 lines of code... :-P .


- -----------------------
So, the first and more obvious is the hardcoded "2" as a count value for the 
string:

Format(ngettext(
                "%s team has won %u %s!",
                "%s team has won %u %ss!",
                2),
            equipe.GetName().c_str(), nbr_ammo, 
WeaponsList::GetInstance()->GetWeapon(contents)->GetName().c_str());


This could mean 2 things:
a) either the coder expected that the count value will always be 2, that means 
that "nbr_ammo" will always be 2
b) or the coder thought that "nbr_ammo" will always need the plural and though 
the way to make that happen is to assure that the count is for plural by 
thinking in an English centric way and how plural is formed in English.


Now, let's take each of them and see how this should have been done properly.
a) if nbr_ammo is expected to be 2, then it makes no sense to use ngettext and 
better write:

gettext("%s team has won 2 ...")

and let the translator to make the peoper translator since he knows best how to 
translate that properly in his/her own language.

(I am sure this was not the case)

b) if this hypothesis is correct, then the developer should trust more getetxt 
and let it do it thing because the count variable in ngettext should be the 
same used for the display and, of course, this is what ngettext was designed 
for, after all


Why the hard coded value is not ok? Simple, in other languages you have more 
than 2 plural forms, so other languages *need* that value to be correctly 
specified so that ngettext chooses the *right* plural form for the 
corresponding count.

.....................
Let me give an example:
The plural formula for Romanian is the following (you can check the header of 
po/ro.po):

Plural-Forms:  nplurals=3; plural=n==1 ? 0 : (n==0 || (n%100 > 0 && n%100 < 
20)) ? 1 : 2;

That looks a little bit more complex than the one for English, doesn't it ;-) ?

What that means is that the way plural is formed depends more strongly on the 
number of counted items.

That you have something like[1][2]:

> 0 errors    = 0 erori
> 1 error     = 1 eroare
> 2 errors    = 2 erori
> 19 errors   = 19 erori
> 20 errors   = 20 de erori
> 25 errors   = 25 de erori
> 99 errors   = 99 de erori
> 100 errors  = 100 de erori
> 101 errors  = 101 erori
> 102 errors  = 102 erori
> ...
> 119 errors  = 119 erori
> 120 errors  = 120 de erori
> 500 errors  = 500 de erori
> 501 errors  = 501 erori
> 519 errors  = 519 erori
> 520 errors  = 520 de erori

So depending on the number of itmes a preposition is added.

Also, for other languages differences are even more severe (quoted from [*]):

In Polish we use e.g. plik (file) this way:
          1 plik
          2,3,4 pliki
          5-21 pliko'w
          22-24 pliki
          25-31 pliko'w

So you see, there's more to ngettext than meets the eye ;-)
.....................


the (almost - see below) correct thing to write would be:

Format(ngettext(
                "%s team has won %u %s!",
                "%s team has won %u %ss!",
                nbr_ammo),
            equipe.GetName().c_str(), nbr_ammo, 
WeaponsList::GetInstance()->GetWeapon(contents)->GetName().c_str());


That concludes the analysis for first mistake.

- -----------------------

The second one, is a more subtle mistake and people that mainly speak languages 
like English where there are only two forms, one for singular, one for plural, 
which differ from each other if the count number is different from 1 are prone 
to miss the problem.

Of course, this can be fixed, too.

The problem is, as some might suspected already, that although the number of 
forms is clear, they might differ in prefix/suffix/form depending on the word 
that designates the counted item. This means that you can't write something 
like:

ngettext(
                "%s team has won %u %s!",
                "%s team has won %u %ss!",
                nbr_ammo),
            equipe.GetName().c_str(), nbr_ammo, 
WeaponsList::GetInstance()->GetWeapon(contents)->GetName().c_str()

Since the final "%ss" might be different for different items.

Example for Romanian:
1 horse   /2 horses   /20 horses    = 1 cal       /2 cai       /20 de cai
1 bomb    /2 bombs    /20 bombs     = 1 bombă     /2 bombe     /20 de bombe
1 polecat /2 polecats /20 polecats  = 1 nevăstuică/2 nevăstuici/20 de nevăstuici
1 launcher/2 launchers/20 launchers = 1 lansator  /2 lansatoare/20 de lansatoare

So as you can see, the terminations change from word to word, sometimes even by 
replacing(l->i, ă->e, ă->i) the one in singular or even changing it slightly 
(tor->toare), so the termination hard coding is broken.

The solution would be to have something like:


weapon_string = 
nLocalizeWeaponName(WeaponsList::GetInstance()->GetWeapon(contents)->GetName().c_str(),
 nbr_ammo);

ngettext(
                "%s team has won %u %s!",
                "%s team has won %u %ss!",
                nbr_ammo),
            equipe.GetName().c_str(), nbr_ammo, weapon_string)


The function nLocalizeWeaponName( <string>, unsigned long int n) must be a 
wrapper over ngettext and should make sure each weapon name is treated 
separately for the reasons above so that a different msgid/msgid_plural appears 
for each weapon, so it is possible to translate properly. Probably a header 
with an array with the strings and the function picks from that, or maybe this 
should be a variable in the object, I haven't thought thoroughly about the 
solution.


So this more complicated, but can be solved, too.

BTW, don't worry, the translators will not cus, they will appreciate that this 
subject was cared for so they can translate properly into their languages.
- -----------------------

If you read until this point, congrats! I owe you a beer and I'll tell you 
secret, this is not about ngettext, but about the way strings should be written 
and here is a simple gettext example.

The third point, is related to this:
gettext("%s team already has infinite ammo for the %s!")


Looks fine at first, but the developer forgot that not all languages form the 
articulated form with a word around the main one. Also, in some languages, the 
articulated form doesn't always make sense, depending on the word.

Let's take another example:
%s team already has infinite ammo for the minigun!
Echipa %s are deja muniţie infinită pentru mitralieră!
(not the articulated form)

%s team already has infinite ammo for the polecat launcher!
Echipa %s are deja muniţie infinită pentru lansatorul de nevăstuici!
(the article is appended to the first word)


%s team already has infinite ammo for the uzi!
Echipa %s are deja muniţie infinită pentru Uzi!
(not needed/name)


Again, here the solution would to have a wrapper that treats all weapon names 
differently since they are handled differently in some languages. So the 
conclusion is that context counts, so it is necessary to let the translator the 
possibility to translate the game in his language properly, without 
bastardizing it.

- -----------------------



CONCLUSION: i18n is more complex than some might expect and gettext/ngettext 
can help, but proper code should be written to support this. l10n can't happen 
properly if efforts and not made to make i18n properly.


In case someone is wondering, yes, I will make a patch and fix these issues, 
but I'll beat^W educate anybody that will repeat these mistakes again after my 
fix, by pointing them to this message ;-) and learn what was written it :-) .


Have a nice day and think about the poor translators! :-)

- -----------------------
[*] http://olympus.het.brown.edu/cgi-bin/info2www?(gettext)Plural+forms
[1] cut, edit & paste from an older explanation: 
http://mail.kde.org/pipermail/kde-i18n-ro/2005-August/000067.html
[2] summary at: 
http://mail.kde.org/pipermail/kde-i18n-ro/2005-August/000070.html
- -- 
Regards,
EddyP
=============================================
"Imagination is more important than knowledge" A.Einstein

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFF1Q5XY8Chqv3NRNoRApUYAKCneFZG22cEURGmygmFU3bqs71wVgCgpFv6
O6YP+Y6iMGLmRRPMawuV12s=
=gQuq
-----END PGP SIGNATURE-----

_______________________________________________
Wormux-dev mailing list
Wormux-dev@gna.org
https://mail.gna.org/listinfo/wormux-dev

Répondre à