New submission from Paul Ellenbogen <p...@cs.princeton.edu>:

Python encounters significant memory fragmentation when unpickling many small 
objects.

I have attached two scripts that I believe demonstrate the issue. When you run 
"dumpy.py" it will generate a large list of namedtuples, then write that list 
to a file using pickle. Before it does so, it pauses for user input. Before 
exiting the script you can view the memory usage in htop or whatever your 
preferred method is.

The "load.py" script loads the file written by dump.py. After loading the data 
is complete, it waits for user input. The memory usage at the point where the 
script is waiting for user input is (more than) twice as much in the "load" 
case as the "dump" case.

The small objects in the list I am storing have 3 values, and I have tested 
three alternative representations: tuple, namedtuple, and a custom class. The 
namedtuple and custom class both have the memory use/fragmentation issue. The 
built in tuple type does not have this issue. Using optimize in pickletools 
doesn't seem to make a difference.

Matthew Cowles from the python help list had some good suggestions, and found 
that the object size themselves, as observed by sys.getsizeof was different 
before and after pickling. Perhaps this is something other than memory 
fragmentation, or something in addition to memory fragmentation.

Although high water mark is similar for both scripts, the pickling script 
settles down on a reasonably smaller memory footprint. I would still consider 
the long run memory waste of unpickling a bug. For example in my use case I 
will run one instance of the equivalent of pickling script, then run many many 
instances of the script that unpickles.


These scripts were run with Python 3.6.7 (GCC 8.2.0) on Ubuntu 18.10.

----------
components: Library (Lib)
files: dump.py
messages: 340615
nosy: Ellenbogen, alexandre.vassalotti
priority: normal
severity: normal
status: open
title: Excessive memory use or memory fragmentation when unpickling many small 
objects
type: resource usage
versions: Python 3.6
Added file: https://bugs.python.org/file48278/dump.py

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue36694>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to