-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello,
After finding out that I have commit access on project's repo I started to work on fixing the i18n issues I have observed in wormux. Since I want to educate people into doing the right thing (tm) wrt i18n, I decided to explain how proper i18n should be done so it is usable. What I am going to write about is ngettext, which is the tool which you want to use when you want write strings that should be different depending on a count number. Also, some nice article, maybe better than my mail is available and I recommend it[*]. So this applies for something like this: You won 1 ninja rope! You won 2 ninja ropes! These are called in gettext terminology "plural forms" although this is slightly confusing/incorrect since it involves the form for singular, too. Don't be confused :-) ! So here goes: I have found this "nice" bit in the code: - -----------8<---------- void BonusBox::ApplyBonus(Team &equipe, Character &ver) { [..] /*this next 'if' should never be true, but I am loath to remove it just in case. */ if(equipe.ReadNbAmmos(WeaponsList::GetInstance()->GetWeapon(contents)->GetName())!=INFINITE_AMMO) { equipe.m_nb_ammos[ WeaponsList::GetInstance()->GetWeapon(contents)->GetName() ] += nbr_ammo; txt << Format(ngettext( "%s team has won %u %s!", "%s team has won %u %ss!", 2), equipe.GetName().c_str(), nbr_ammo, WeaponsList::GetInstance()->GetWeapon(contents)->GetName().c_str()); } else { txt << Format(gettext("%s team already has infinite ammo for the %s!"), //this should never appear equipe.GetName().c_str(), WeaponsList::GetInstance()->GetWeapon(contents)->GetName().c_str()); [..] } - -----------8<---------- This bit is broken from a i18n POV in about three different ways. That only in 17 lines of code... :-P . - ----------------------- So, the first and more obvious is the hardcoded "2" as a count value for the string: Format(ngettext( "%s team has won %u %s!", "%s team has won %u %ss!", 2), equipe.GetName().c_str(), nbr_ammo, WeaponsList::GetInstance()->GetWeapon(contents)->GetName().c_str()); This could mean 2 things: a) either the coder expected that the count value will always be 2, that means that "nbr_ammo" will always be 2 b) or the coder thought that "nbr_ammo" will always need the plural and though the way to make that happen is to assure that the count is for plural by thinking in an English centric way and how plural is formed in English. Now, let's take each of them and see how this should have been done properly. a) if nbr_ammo is expected to be 2, then it makes no sense to use ngettext and better write: gettext("%s team has won 2 ...") and let the translator to make the peoper translator since he knows best how to translate that properly in his/her own language. (I am sure this was not the case) b) if this hypothesis is correct, then the developer should trust more getetxt and let it do it thing because the count variable in ngettext should be the same used for the display and, of course, this is what ngettext was designed for, after all Why the hard coded value is not ok? Simple, in other languages you have more than 2 plural forms, so other languages *need* that value to be correctly specified so that ngettext chooses the *right* plural form for the corresponding count. ..................... Let me give an example: The plural formula for Romanian is the following (you can check the header of po/ro.po): Plural-Forms: nplurals=3; plural=n==1 ? 0 : (n==0 || (n%100 > 0 && n%100 < 20)) ? 1 : 2; That looks a little bit more complex than the one for English, doesn't it ;-) ? What that means is that the way plural is formed depends more strongly on the number of counted items. That you have something like[1][2]: > 0 errors = 0 erori > 1 error = 1 eroare > 2 errors = 2 erori > 19 errors = 19 erori > 20 errors = 20 de erori > 25 errors = 25 de erori > 99 errors = 99 de erori > 100 errors = 100 de erori > 101 errors = 101 erori > 102 errors = 102 erori > ... > 119 errors = 119 erori > 120 errors = 120 de erori > 500 errors = 500 de erori > 501 errors = 501 erori > 519 errors = 519 erori > 520 errors = 520 de erori So depending on the number of itmes a preposition is added. Also, for other languages differences are even more severe (quoted from [*]): In Polish we use e.g. plik (file) this way: 1 plik 2,3,4 pliki 5-21 pliko'w 22-24 pliki 25-31 pliko'w So you see, there's more to ngettext than meets the eye ;-) ..................... the (almost - see below) correct thing to write would be: Format(ngettext( "%s team has won %u %s!", "%s team has won %u %ss!", nbr_ammo), equipe.GetName().c_str(), nbr_ammo, WeaponsList::GetInstance()->GetWeapon(contents)->GetName().c_str()); That concludes the analysis for first mistake. - ----------------------- The second one, is a more subtle mistake and people that mainly speak languages like English where there are only two forms, one for singular, one for plural, which differ from each other if the count number is different from 1 are prone to miss the problem. Of course, this can be fixed, too. The problem is, as some might suspected already, that although the number of forms is clear, they might differ in prefix/suffix/form depending on the word that designates the counted item. This means that you can't write something like: ngettext( "%s team has won %u %s!", "%s team has won %u %ss!", nbr_ammo), equipe.GetName().c_str(), nbr_ammo, WeaponsList::GetInstance()->GetWeapon(contents)->GetName().c_str() Since the final "%ss" might be different for different items. Example for Romanian: 1 horse /2 horses /20 horses = 1 cal /2 cai /20 de cai 1 bomb /2 bombs /20 bombs = 1 bombă /2 bombe /20 de bombe 1 polecat /2 polecats /20 polecats = 1 nevăstuică/2 nevăstuici/20 de nevăstuici 1 launcher/2 launchers/20 launchers = 1 lansator /2 lansatoare/20 de lansatoare So as you can see, the terminations change from word to word, sometimes even by replacing(l->i, ă->e, ă->i) the one in singular or even changing it slightly (tor->toare), so the termination hard coding is broken. The solution would be to have something like: weapon_string = nLocalizeWeaponName(WeaponsList::GetInstance()->GetWeapon(contents)->GetName().c_str(), nbr_ammo); ngettext( "%s team has won %u %s!", "%s team has won %u %ss!", nbr_ammo), equipe.GetName().c_str(), nbr_ammo, weapon_string) The function nLocalizeWeaponName( <string>, unsigned long int n) must be a wrapper over ngettext and should make sure each weapon name is treated separately for the reasons above so that a different msgid/msgid_plural appears for each weapon, so it is possible to translate properly. Probably a header with an array with the strings and the function picks from that, or maybe this should be a variable in the object, I haven't thought thoroughly about the solution. So this more complicated, but can be solved, too. BTW, don't worry, the translators will not cus, they will appreciate that this subject was cared for so they can translate properly into their languages. - ----------------------- If you read until this point, congrats! I owe you a beer and I'll tell you secret, this is not about ngettext, but about the way strings should be written and here is a simple gettext example. The third point, is related to this: gettext("%s team already has infinite ammo for the %s!") Looks fine at first, but the developer forgot that not all languages form the articulated form with a word around the main one. Also, in some languages, the articulated form doesn't always make sense, depending on the word. Let's take another example: %s team already has infinite ammo for the minigun! Echipa %s are deja muniţie infinită pentru mitralieră! (not the articulated form) %s team already has infinite ammo for the polecat launcher! Echipa %s are deja muniţie infinită pentru lansatorul de nevăstuici! (the article is appended to the first word) %s team already has infinite ammo for the uzi! Echipa %s are deja muniţie infinită pentru Uzi! (not needed/name) Again, here the solution would to have a wrapper that treats all weapon names differently since they are handled differently in some languages. So the conclusion is that context counts, so it is necessary to let the translator the possibility to translate the game in his language properly, without bastardizing it. - ----------------------- CONCLUSION: i18n is more complex than some might expect and gettext/ngettext can help, but proper code should be written to support this. l10n can't happen properly if efforts and not made to make i18n properly. In case someone is wondering, yes, I will make a patch and fix these issues, but I'll beat^W educate anybody that will repeat these mistakes again after my fix, by pointing them to this message ;-) and learn what was written it :-) . Have a nice day and think about the poor translators! :-) - ----------------------- [*] http://olympus.het.brown.edu/cgi-bin/info2www?(gettext)Plural+forms [1] cut, edit & paste from an older explanation: http://mail.kde.org/pipermail/kde-i18n-ro/2005-August/000067.html [2] summary at: http://mail.kde.org/pipermail/kde-i18n-ro/2005-August/000070.html - -- Regards, EddyP ============================================= "Imagination is more important than knowledge" A.Einstein -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFF1Q5XY8Chqv3NRNoRApUYAKCneFZG22cEURGmygmFU3bqs71wVgCgpFv6 O6YP+Y6iMGLmRRPMawuV12s= =gQuq -----END PGP SIGNATURE----- _______________________________________________ Wormux-dev mailing list Wormux-dev@gna.org https://mail.gna.org/listinfo/wormux-dev