Kristján Valur Jónsson added the comment:

Basically, reuse of strings (and preservation of their internment status) fell 
by the wayside somewhere in the 3.x transition.  Strings have been reused, and 
interned strings re-interned, since protocol version 1 in 2.x.  This patch adds 
that feature back, and uses that mechanism to reuse not only strings, but also 
any other multiply-referenced object.

It is not desirable to simply intern all strings that are read from marshaled 
data.  Only selected strings are interned by python during compilation and we 
want to keep it that way.  Also, 2.x reuses not only interned strings but other 
strings as well.

Generalizing reuse of strings to other objects is trivial, and a logical step 
forward.  This allows optimizations to be made on code objects where common 
data are identified and instanced, and those code objects to be saved and 
reloaded with that instancing intact.

But even without such code-object optimization, the changes are significant:
The sizes of the marshaled code object of lib/test/test_marshal drops from 
24093 bytes in version 2 to 17841 bytes with version 3, without any additional 
massaging of the module code object.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue16475>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to