When trying to run lzma in parallel (see the code below) it hangs for a very long time. The non-parallel version of the code using map() works fine as shown in the code below.
Python 3.3.2 [GCC 4.6.3] on linux import lzmafrom functools import partialimport multiprocessing def run_lzma(data,c): return c.compress(data) def split_len(seq, length): return [str.encode(seq[i:i+length]) for i in range(0, len(seq), length)] def lzma_mp(sequence,threads=3): lzc = lzma.LZMACompressor() blocksize = int(round(len(sequence)/threads)) strings = split_len(sequence, blocksize) lzc_partial = partial(run_lzma,c=lzc) pool=multiprocessing.Pool() lzc_pool = list(pool.map(lzc_partial,strings)) pool.close() pool.join() out_flush = lzc.flush() return b"".join(lzc_pool + [out_flush]) sequence = 'AAAAAJKDDDDDDDDDDDDDDDDDDDDDDDDDDDDGJFKSHFKLHALWEHAIHWEOIAH IOAHIOWEHIOHEIOFEAFEASFEAFWEWWWWWWWWWWWWWWWWWWWWWWWWWWWWWEWFQWEWQWQGEWQFEWFDWEWEGEFGWEG' lzma_mp(sequence,threads=3) When using lzma and the map function it works fine. threads=3 blocksize = int(round(len(sequence)/threads)) strings = split_len(sequence, blocksize) lzc = lzma.LZMACompressor() out = list(map(lzc.compress,strings)) out_flush = lzc.flush() result = b"".join(out + [out_flush]) lzma.compress(str.encode(sequence)) lzma.compress(str.encode(sequence)) == result Map using partial function works fine as well. lzc = lzma.LZMACompressor() lzc_partial = partial(run_lzma,c=lzc) out = list(map(lzc_partial,strings)) out_flush = lzc.flush() result = b"".join(out + [out_flush])
-- https://mail.python.org/mailman/listinfo/python-list