On Tuesday, 27 August 2019 10:38:36 UTC+2, Nigel Tao wrote:
>
> Nice work, Klaus! 
>
>
> On Mon, Aug 26, 2019 at 8:29 PM Klaus Post <klau...@gmail.com 
> <javascript:>> wrote: 
> > Concurrent stream compression - several GB/sec. 
> > Faster decompression 
>
> A number of modern compression formats and implementations allow for 
> concurrent *compression*. 
>
> Coincidentally, I've been working on a format that allows for 
> concurrent *decompression*. 
>

The streaming format of Snappy (and S2 by inheritance) does allow for both 
concurrent decompression and fast forward seeking since the compressed 
block size is known.

I have implemented the skip method here 
https://godoc.org/github.com/klauspost/compress/s2#Reader.Skip  - it will 
skip blocks until it actually needs to decode, so everything skipped is 
basically read at IO speed. It will not allow arbitrary seeking. That would 
require an index to be stored at the end... and then you are not really 
dealing with streams any more.

I have not done a concurrent decompressor yet, since I suspect the gain 
will be fairly minimal due to the already high speed of decompression. It 
would be fairly trivial to split an incoming stream into blocks and decode 
them separately.

>
> Well, actually, my primary motivation is random access 
> (de)compression: being able to reconstruct part of the decompressed 
> file, starting at a given offset into the decompressed file, without 
> always having to first decompress all of the preceding data. But being 
> able to decompress parts of a compressed file independently means that 
> we can decompress in parallel. In this example below, RAC + ZLIB 
> decompressed about 8x faster (wall clock time) than ZIP, even though 
> (1) they're both based on DEFLATE, and (2) the Go DEFLATE decoder 
> isn't quite as fast as the C one. 


> ---- 
> $ ls -l 
> -rw-r--r-- 1 tao tao 100000000 Jun  2  2011 enwik8 
> -rw-r--r-- 1 tao tao  36445475 Sep  2  2011 enwik8.zip 
> -rw-r--r-- 1 tao tao  37320896 Aug  9 19:16 shared.rac 
>
> $ cat enwik8                  | sha256sum 
> 2b49720ec4d78c3c9fabaee6e4179a5e997302b3a70029f30f2d582218c024a8  - 
> $ unzip -p enwik8.zip         | sha256sum 
> 2b49720ec4d78c3c9fabaee6e4179a5e997302b3a70029f30f2d582218c024a8  - 
> $ ractool -decode shared.rac  | sha256sum 
> 2b49720ec4d78c3c9fabaee6e4179a5e997302b3a70029f30f2d582218c024a8  - 
>
> $ time unzip -p enwik8.zip        > /dev/null 
> real    0m0.737s 
> user    0m0.713s 
> sys     0m0.025s 
> $ time ractool -decode shared.rac > /dev/null 
> real    0m0.095s 
> user    0m1.316s 
> sys     0m0.069s 
> ---- 
>
> More details are at https://godoc.org/github.com/google/wuffs/cmd/ractool 
>
> In particular, RAC is a 'container format', not tied to any one 
> underlying compression codec like DEFLATE / GZIP / ZLIB. We could 
> easilly have a "RAC + Snappy(2)" flavor. 
>

Nice!

There is definitely a good place for compression where you have random 
access to the data. Using Snappy/S2 blocks would be basically the same as 
the streaming format. Actually you could make a Snappy/S2 stream -> rac 
converter, which could use the already compressed stream.

The shared dictionary is a nice touch!

Can the concurrent decompression run on a pure stream, or does it need to 
read the index first?

Oh yeah, now I have you here, I have sent in a few PR's with some speedups 
for Snappy I found while I was writing this: 
https://github.com/golang/snappy/pulls - they are fairly trivial. 

 /Klaus

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/78f02dea-7ef8-4a90-bc8b-43193aec8cb5%40googlegroups.com.

Reply via email to