New submission from s-ball <s-b...@laposte.net>:
When working on i18n, I realized that msgfmt.py did not generate any hash table. One step further, I realized that the gettext.py would not have used it because it unconditionnaly loads the whole translation files and contains the following TODO message: TODO: - Lazy loading of .mo files. Currently the entire catalog is loaded into memory, but that's probably bad for large translated programs. Instead, the lexical sort of original strings in GNU .mo files should be exploited to do binary searches and lazy initializations. Or you might want to use the undocumented double-hash algorithm for .mo files with hash tables, but you'll need to study the GNU gettext code to do this. I have studied the code, and found that it should not be too complex to implement it in pure Python. I have posted a message on python-ideas about it and here are my conclusion: Features: ======== The gettext module should be allowed to load lazily the catalogs from mo file. This lazy load should be optional and make use of the hash tables from mo files when they are present or revert to a binary search. The translation strings should be cached for better performances. API changes: ============ 3 functions from the gettext module will have 2 new optional parameter named caching, and keepopen: gettext.bindtextdomain(domain, localedir=None) would become gettext.bindtextdomain(domain, localedir=None, caching=None, keepopen=False) gettext.translation(domain, localedir=None, languages=None, class_=None, fallback=False, codeset=None) would become gettext.translation(domain, localedir=None, languages=None, class_=None, fallback=False, codeset=None, caching=None, keepopen=False) gettext.install(domain, localedir=None, codeset=None, names=None) would become gettext.install(domain, localedir=None, codeset=None, names=None, caching=None, keepopen=False) The new caching parameter could receive the following values: caching=None: revert to the previour eager loading of the full catalog. It will be the default to allow previous application to see no change caching=1: lazy loading with unlimited cache caching=n where n is a positive (>=0) integer value: lazy loading with a LRU cache limited to n strings The keepopen parameter would be a boolean: keepopen=False (default): the mo file is only opened before loading a translation string and closed immediately after - it is also opened once when the GNUTranslation class is initialized to load the file description keepopen=True: the mo file is kept open during the lifetime of the GNUTranslation object. This parameter is ignored and not used if caching is None Implementation: ============== The current GNUTranslation class loads the content of the mo file to build a dictionnary where the original strings are the keys and the translated keys the values. Plural forms use a special processing: the key is a 2 tuple (singular original string, order), and the value is the corresponding translated string - order=0 is normally for the singular translated string. The proposed implementation would simply replace this dictionary with a special mapping subclass when caching is not None. That subclass would use same keys as the original directory and would: - first search in its cache - if not found in cache and if the hashtable has not a zero size search the original string by hash - if not found in cache and if the hashtable has a zero size, search the original string with a binary search algorithm. - if a string is found, it should feed the LRU cache, eventually throwing away the oldest entry (entries) That should allow to implement the new feature with minimal refactoring for the gettext module. But I also propose to change msgfmt.py to build the hashtable. IMHO, the function should lie in the standard library probably as a submodule of gettext to allow various Python projects (pybabel, django) to directly use it instead of developping their own ones. I will probably submit a PR in a while but it will will require some time to propose a full implementation with a correct test coverage. ---------- components: Library (Lib) messages: 332815 nosy: s-ball priority: normal severity: normal status: open title: Allow lazy loading of translations in gettext. type: enhancement versions: Python 3.8 _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue35628> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com