On 05/10/17 20:38, Stephan Houben wrote: > Op 2017-10-05, Thomas Nyberg schreef <tomuxi...@gmx.com>: >> Btw if anyone knows a better way to handle this sort of thing, I'm all >> ears. Given my current implementation I could use any compression that >> works with stdin/stdout as long as I could sort out the waiting on the >> subprocess. In fact, bzip2 is probably more than I need...I've half used >> it out of habit rather than anything else. > lzma ("xv" format) compression is generally both better and faster than > bzip2. So that gives you already some advantage. > > Moreover, the Python lzma docs say: > > "When opening a file for reading, the input file may be the concatenation > of multiple separate compressed streams. These are transparently decoded > as a single logical stream." > > This seems to open the possibility to simply divide your input into, > say, 100 MB blocks, compress each of them in a separate thread/process > and then concatenate them.
Perhaps, but - this will probably be less space-efficient (by a small enough fraction, if your blocks are large enough) - this *might* be a Python implementation detail (I doubt that, but who knows) - this obviously won't work for decompression unless you know a priori that there are division points, and where they are. Thomas -- https://mail.python.org/mailman/listinfo/python-list