Antoine Pitrou <pit...@free.fr> added the comment: > The gz in question is 17mb compressed and 247mb uncompressed. Calling > zcat the python process uses between 250 and 260 mb with the whole > string in memory using zcat as a fork. Numbers for the gzip module > aren't obtainable except for readline(), which doesn't use much memory > but is very slow. Other methods thrash the machine to death. > > The machine has 300mb free RAM from a total of 1024mb.
That would be the explanation. Reading the whole file at once and then doing splitlines() on the result consumes twice the memory, since a list of lines must be constructed while the original data is still around. If you had more than 600 MB free RAM the splitlines() solution would probably be adequate :-) Doing repeated calls to splitlines() on chunks of limited size (say 1MB) would probably be fast enough without using too much memory. It would be a bit less trivial to implement though, and it seems you are ok with the subprocess solution. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue7471> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com