priv.onet.pl)

Przemyslaw Czerpak Sat, 01 Nov 2008 16:24:54 -0700

On Sat, 01 Nov 2008, Szak�ts Viktor wrote:

Hi Viktor,


> The other issue is that 2 dimensional array. Currently
> I'm changing it to 1 dimensional on load, and lookup
> is faster this way, and the whole translation array
> takes less space. Maybe it doesn't matter, I don't know.

With hash array the search can be done without function
call by simple

>
> BTW, I've now tested saved size with serialize as is
> (2 dimensional), and it's almost exactly the same as
> current __i18n_save(). This is pretty good. Then I've
> tested loading speed, and it's only the half for
> deserialize.
> _i18n_loadfrommemory() [1D]: 3.6s
> hb_deserialize() [2D]: 6.7s

Thanks for the tests.
The deserialization code makes always two passes.
1-st one is for checking if passed data can be deserialized
without any errors so some operations are doubled.
If we can "trust" that the data is valid then this pass can
be eliminated though I do not think it's good idea. Such
decoding can be done once and the overhead is rather small
so I prefer to validate the whole operation to avoid serious
problems like GPF or out of memory message when corrupted data
is passed.
The serialization code has also protection and support for
arrays and hashes with cyclic references. So it will not
GPF if you pass to serialize code array with cyclic reference.
It will be stored correctly and the cyclic references will
be restored correctly during deserialization. This protection
also cost a little bit. In summary it creates the speed difference.

> 1000 iterations with strings preloaded from disk,
> for 9200 string pairs (682KB file for both) on a P4HT 2.6.
> I've repeated the tests by using flattened 1 dimensional
> array, which made the serialized files smaller than i18n
> functions, and it also made the loading as below:
> hb_deserialize() [1D]: 3.95s
> Which is pretty good, so the most optimal would be to use
> hb_serialize/deserialize with a flat array, flattened
> on save. (saving is not speed or memory critical)
> I didn't explore hash, as I have zero experience with them.

The most important is the performance of accessing I18N data
at runtime. Hash arrays allows to access translated strings
without function call. They are also very easy to manage at
.prg level. F.e. this is customized version of hbi18n.c
(without RT errors) which can be written by any user if
we will use standard serialization code and hashes.

   FUNC __I18N_SAVE( cFile, hTrans )
      return hb_memoWrit( cFile, hb_serialize( hTrans ) )

   FUNC __I18N_LOAD( cFile )
      return hb_deserialize( hb_memoRead( cFile ) )

   FUNC __I18N_LOADFROMMEMORY( cData )
      return hb_deserialize( cData )

   FUNC __I18N_GETTEXT( cText, hTrans )
      if cText $ hTrans
         cText := hTrans[ cText ]
      endif
      return cText

   PROC __I18N_ADDTRANS( hTrans, cText, cTrans )
      hTrans[ cText ] := cTrans

   FUNC __I18N_INITTRANS()
      return hb_hSetAutoAdd( { => } )

Isn't simple?
For me such flexibility is very important.
Do you want to add domain support? Let's make changing few two functions:

   FUNC __I18N_GETTEXT( cText, hTrans, cDomain )
      local hDomain
      if cDomain == NIL
         cDomain := "[MAIN]"
      endif
      if cDomain $ hTrans
         hDomain := hTrans[ cDomain ]
         if cText $ hDomain
            cText := hDomain[ cText ]
         endif
      endif
      return cText

   PROC __I18N_ADDTRANS( hTrans, cText, cTrans, cDomain )
      if cDomain == NIL
         cDomain := "[MAIN]"
      endif
      if !cDomain $ hTrans
         hTrans[ cDomain ] := hb_hSetAutoAdd( { => } )
      endif
      hTrans[ cDomain, cText ] := cTrans


And now we only have to create new function __I18N_LOADPOT( cFile )
which will make something like:

   FUNC __I18N_LOADPOT( cFile )
      LOCAL hTrans := __I18N_INITTRANS()
      LOCAL cLine, cDomain, cText, cTrans

      FOR EACH cLine in hb_aTokens( memoread( cFile ), hb_osNewLine() )
         IF cLine = "msgctxt "
            cDomain := substr( cLine, 10, len( cLine ) - 10 )
         ELSEIF cLine = "msgid "
            cText := substr( cLine, 8, len( cLine ) - 8 )
         ELSEIF cLine = "msgstr "
            cTrans := substr( cLine, 9, len( cLine ) - 9 )
            IF !EMPTY( cText )
               IF EMPTY( cTrans )
                  cTrans := cText
               ENDIF
               __I18N_ADDTRANS( hTrans, cText, cTrans, cDomain )
               cText := cDomain := NIL
            ENDIF
         ENDIF
      NEXT
      RETURN hTrans

It's simplified version with only very basic functionality and without
error reporting for wrong .pot files. Nothing above is tested (just
written by finger) but should work. Of course .prg version is slower then
C code but it can be very easy written also in C. Important is easy to
manage format.
Of course it will be a little bit slower then dedicated format with
code optimized for it (BTW your code seems to be optimal, very nice
job) but the speed difference should not be noticeable in normal
applications.

best regards,
Przemek
_______________________________________________
Harbour mailing list
Harbour@harbour-project.org
http://lists.harbour-project.org/mailman/listinfo/harbour

Re: [Harbour] 2008-11-01 21:13 UTC+0100 Przemyslaw Czerpak (druzus/at/priv.onet.pl)

Reply via email to