New submission from Antoine Pitrou <pit...@free.fr>: On a 8GB RAM box (more than 6GB free), serializing many small objects can eat all memory, while the end result would take around 600MB on an UCS2 build:
$ LANG=C time opt/python -c "import json; l = [1] * (100*1024*1024); encoded = json.dumps(l)" Traceback (most recent call last): File "<string>", line 1, in <module> File "/home/antoine/cpython/opt/Lib/json/__init__.py", line 224, in dumps return _default_encoder.encode(obj) File "/home/antoine/cpython/opt/Lib/json/encoder.py", line 188, in encode chunks = self.iterencode(o, _one_shot=True) File "/home/antoine/cpython/opt/Lib/json/encoder.py", line 246, in iterencode return _iterencode(o, 0) MemoryError Command exited with non-zero status 1 11.25user 2.43system 0:13.72elapsed 99%CPU (0avgtext+0avgdata 27820320maxresident)k 2920inputs+0outputs (12major+1261388minor)pagefaults 0swaps I suppose the encoder internally builds a large list of very small unicode objects, and only joins them at the end. Probably we could join it by chunks so as to avoid this behaviour. ---------- messages: 142338 nosy: ezio.melotti, pitrou, rhettinger priority: normal severity: normal status: open title: JSON-serializing a large container takes too much memory type: resource usage versions: Python 3.3 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue12778> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com