priv.onet.pl)

Przemyslaw Czerpak Fri, 28 Nov 2008 13:40:22 -0800

On Fri, 28 Nov 2008, Mindaugas Kavaliauskas wrote:

Hi Mindaugas,


> I was thinking about:
>    {"LANGUAGE"=>"PL_PL", "CPID"=>"PLISO", "TABLE"=>{...}}
> but using numeric keys are also possible.

In such case we will have translations in separate table
encapsulated in other table. If we accept it then it may
be cleaner and easier to manage so it's OK for me.

>> 2. how to keep translation in final binary files.
>>    hb_itemSerialize() seems to be natural form
> I think hb_itemSerialize() is good for the table storage, but language, 
> cpid could be stored in header. Header containing at least signature and 
> file version would be a good idea. It can help identify file for various 
> tools without decoding serialized data, ex., /etc/magic

OK.

>> 5. For easy use it's necessary to have in core code function which will
>>    work like printf(). Otherwise it will be hard to create strings for
>>    translations which are not context dependent.
>>    Such function should be in base escape characters like %s or %d
>>    compatible with C so we can use dedicated tools to operate on .pot
>>    files. Anyhow for better flexibility we should add support for stringify
>>    and format any item size. F.e. %s should also work for numeric, date and
>>    logical items. In general it should be possible to create formatting
>>    similar to the one given by transform and picture clasue.    We can add 
>> support for passing picture clase directly in formatted
>>    string or as parameter. We can also add some additional extensions.
>>    The function name is less important. We can call it hb_strFormat().
>>    It's time to implement it.
> Yes. We need it. The problem is, if we had to be C compatible, or we need 
> to to implement our own Harbour specific type specifiers.
> A simple implementation could be:
>   %d = LTRIM(STR(nValue)) or LTRIM(STR(ROUND(nValue, 0))) ???
>   %s = cValue or TRIM(cValue) ???
> but we will always want some new extensions. I still can not make agreement 
> with myself what specifiers should be. Do you have some ideas about it?
> From i18n point of view a very important thing is %1$d extension. It is not 
> related to speficier type problem, so I we need to implement it for sure.

Just like you I still cannot take final decision.
For use parameter indexing is a must.

>> 6. We should decide if we want to add support for plural form translations.
>>    Now it's not supported by us. In many cases we can live without it but
>>    sometimes it's useful. It will be important for final translation
>>    representation though seems that if we will use hashes then it can be
>>    added later. We can simply add translation for plural forms as additional
>>    hash item attributes.
> In Lithuanian language plural forms does not always solves the problem, 
> because we need more that 2 word ending. AFAIK, the same problem is for 
> Polish translations. I guess we can live without plurals.

This is resolved in gettext. At least for European languages.
Usually it's possible to define some limited number of basic
rules which depends on given number. F.e. in Polish there are
3 forms: for 1, for numbers ending with 2,3,4 except 11,12,...,19
and for rest.
   1 dom
   2 domy, 3 domy, 4 domy, 22 domy, 34 domy
   5 domów, 6 domów, 11 domów, 13 domów, 31 domów, 45 domów

In Lithuanian you have also 3 forms:
   1. for numbers ending with 10, 11, 12, 13, ..., 19
   2. for numbers ending with 1
   3. for rest

When number for rules is finite and it's possible to create
math function which translate any number to rule number then
this rule number can be used to index alternative translations.
gettext allows to intorduce such rule in translation definition
as C expression with with one variable 'n' which points to number
passed as additional gettext parameter. We can use clipper syntax
to define such expression.
F.e. for Polish I can define rule indexing plural forms as:
   iif( n == 1, 1, iif( n % 10 >= 2 .and. n % 10 <= 4 .and. ;
                        ( n % 100 < 10 .or. n%100 >= 20 ), 2, 3 ) )
and for Lithuanian:
   iif( n % 100 >= 10 .and. n % 100 < 20, 1, iif( n % 10 == 1, 2, 3 ) )

Then we only have to introduce support for defining and storing
alternative translations which will be chosen by index created
from one of translation function parameter. We can define that
when 1-st parameter of hb_i18n_gettext() is numeric value then
it's extended form which allows index translation. Index is
created from the 1-st parameter using plural index expression
like above defined in translation rule.

I can try to implement it but just like you I also can live without it
though sometimes it reduce the translation quality. just simply I would
like to know if someone will really use it.

>> 8. The C interface should allow to use different low level implementation
>>    so if someone will want to use real gettext API then it can register
>>    his own wrappers.
> Do you mean just an overloading of i18n C function by other module, or some 
> more complex possibility to have alternative gettext?

I'm thinking about simple table with function address which can be
overloaded using some public function so it will be possible to
replace our default functions by user ones.

>> 9. We should add support for automatic CP translations in output strings.
>>    Otherwise we will end with many different lang modules for different
>>    encoding like in msg*.c files. There is a question if we also want to
>>    add translation for input strings but I do not think it's very important.
>>    We can leave it open for future decissions.
> According to http://www.gnu.org/software/gettext/manual/gettext.html 
> 11.2.4: Note that the msgid argument to gettext is not subject to character 
> set conversion. Also, when gettext does not find a translation for msgid, 
> it returns msgid unchanged – independently of the current output 
> character set. It is therefore recommended that all msgids be US-ASCII 
> strings.
> So, I think we do not need to convert input strings.

I know about it and I agree that it's not strictly necessary but
I do not know if people who do not use latin base languages will
share this opinion.

>> 7. We should give more precise meaning for domain names in our 
>> implementation.
>>    I think that using it as language ID is quite good idea.
>> 10. We should decide about global settings which will control the translation
>>     module:
>>       - default domain/language
>>       - default path with translation file
>>     If we want to make them thread local then they should be controlled by
>>     _SET_* structure. In such case they will be inherited by child threads.
> According to this document domains are used to as a synonym for package or 
> library. The default domain is "messages". My knowledge about original 
> gettext and is only theoretical. All I know is 
> http://www.gnu.org/software/gettext/manual/gettext.html, but I've never 
> used it real life. So, it is hard for me to say, if we can mix domain and 
> language. It is different things in getttext.

Yes. But original gettext uses locale setting to determinate language
and output CP. domain is necessary because translations are stored
in common system directory which can be shared between different
computers and even OS-es. If you have set of similar tools then
they can share translation tables because many of strings will be
the same. You can also create program which will use messages existing
in some domains so instead of creating your own translations you can
simply use them and only add few unique to you program messages if
necessary. In such case you can benefit from other people translations.
It's a nice feature when you are creating programs which should be
well integrated with system. It's the reason why I want to keep easy
way to replace our own i18n module by alternative one like wrapper to
real gettext. Probably I'll create such wrapper. Anyhow it will not be
good solution for people wanting to distribute their programs as self
contain packages which will use the same translation tables in different
OS-es. gettext dependencies will not be well seen in MS-Windows world ;-)
People should have an option. F.e. when I want to create some system
tools using Harbour which will have to be integrated with other tools
like WWW server or RDBMS then I chose original gettext. Otherwise I'd
prefer our own integrated solution because I can easy use the same
translation files in other supported by Harbour platforms. I can also
introduce some extensions like automatic translation updating from
my host so new translations will be available for users automatically.

In our own implementation we can define different meaning for domain
then the one which exists in gettext where inside each domain you
have translations for different languages. When someone decide to
use our own i18n implementation then usually need it only for his own
application and does not need domains defined like for gettext.
We can also use domains for context translations.
It's our choice.

> BTW, in my application I had more need for context than domain. Because 
> sometimes the same word needs to have different translations, ex., "Exit" 
> has a different translation depending on meaning, if it is "An exit" or "To 
> exit".
> I'm currently using my hackish context implementation, but it would be nice 
> that Harbours i18n would support contexts.

And we can use domain as context signature for alternative translations.
In such case when domain is given we will look for translation in given
domain and if it's not found then in main domain.
To resolve context problem we can also define some dummy pattern which
will be eliminated by our own hb_strFormat() function and use it to create
unique strings, f.e.:
   "[EMAIL PROTECTED]"
   "[EMAIL PROTECTED]"

%<n>@ will be replaced by empty string.

>> I can implement the base C code when I'll hear your opinion about above
>> points.
>> It will be good if you or someone else can work on hb_strFormat() and
>> hbi18n tool for creating translation.
> I'll try to do as much as possible. Implementation of API to create/edit 
> translation table is not a problem. I only doubt my possibility to 
> implement the final tool, because I was always using my own GUI library, 
> and I've never tried to do browse() or @ 1, 1, SAY "Hello".

Thank you very much.
In this weekend or at the beginning of next one I'll try to create some
basic core version.

best regards,
Przemek
_______________________________________________
Harbour mailing list
Harbour@harbour-project.org
http://lists.harbour-project.org/mailman/listinfo/harbour

Re: [Harbour] 2008-11-01 21:13 UTC+0100 Przemyslaw Czerpak (druzus/at/priv.onet.pl)

Reply via email to