On Tuesday, 27 August 2019 10:38:36 UTC+2, Nigel Tao wrote: > > Nice work, Klaus! > > > On Mon, Aug 26, 2019 at 8:29 PM Klaus Post <klau...@gmail.com > <javascript:>> wrote: > > Concurrent stream compression - several GB/sec. > > Faster decompression > > A number of modern compression formats and implementations allow for > concurrent *compression*. > > Coincidentally, I've been working on a format that allows for > concurrent *decompression*. >
The streaming format of Snappy (and S2 by inheritance) does allow for both concurrent decompression and fast forward seeking since the compressed block size is known. I have implemented the skip method here https://godoc.org/github.com/klauspost/compress/s2#Reader.Skip - it will skip blocks until it actually needs to decode, so everything skipped is basically read at IO speed. It will not allow arbitrary seeking. That would require an index to be stored at the end... and then you are not really dealing with streams any more. I have not done a concurrent decompressor yet, since I suspect the gain will be fairly minimal due to the already high speed of decompression. It would be fairly trivial to split an incoming stream into blocks and decode them separately. > > Well, actually, my primary motivation is random access > (de)compression: being able to reconstruct part of the decompressed > file, starting at a given offset into the decompressed file, without > always having to first decompress all of the preceding data. But being > able to decompress parts of a compressed file independently means that > we can decompress in parallel. In this example below, RAC + ZLIB > decompressed about 8x faster (wall clock time) than ZIP, even though > (1) they're both based on DEFLATE, and (2) the Go DEFLATE decoder > isn't quite as fast as the C one. > ---- > $ ls -l > -rw-r--r-- 1 tao tao 100000000 Jun 2 2011 enwik8 > -rw-r--r-- 1 tao tao 36445475 Sep 2 2011 enwik8.zip > -rw-r--r-- 1 tao tao 37320896 Aug 9 19:16 shared.rac > > $ cat enwik8 | sha256sum > 2b49720ec4d78c3c9fabaee6e4179a5e997302b3a70029f30f2d582218c024a8 - > $ unzip -p enwik8.zip | sha256sum > 2b49720ec4d78c3c9fabaee6e4179a5e997302b3a70029f30f2d582218c024a8 - > $ ractool -decode shared.rac | sha256sum > 2b49720ec4d78c3c9fabaee6e4179a5e997302b3a70029f30f2d582218c024a8 - > > $ time unzip -p enwik8.zip > /dev/null > real 0m0.737s > user 0m0.713s > sys 0m0.025s > $ time ractool -decode shared.rac > /dev/null > real 0m0.095s > user 0m1.316s > sys 0m0.069s > ---- > > More details are at https://godoc.org/github.com/google/wuffs/cmd/ractool > > In particular, RAC is a 'container format', not tied to any one > underlying compression codec like DEFLATE / GZIP / ZLIB. We could > easilly have a "RAC + Snappy(2)" flavor. > Nice! There is definitely a good place for compression where you have random access to the data. Using Snappy/S2 blocks would be basically the same as the streaming format. Actually you could make a Snappy/S2 stream -> rac converter, which could use the already compressed stream. The shared dictionary is a nice touch! Can the concurrent decompression run on a pure stream, or does it need to read the index first? Oh yeah, now I have you here, I have sent in a few PR's with some speedups for Snappy I found while I was writing this: https://github.com/golang/snappy/pulls - they are fairly trivial. /Klaus -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/78f02dea-7ef8-4a90-bc8b-43193aec8cb5%40googlegroups.com.