Re: [go-nuts] compressing long list of short strings

Dan Kortschak Wed, 10 Aug 2016 17:30:07 -0700

This looks like something that is solved for genomics data. If you are
OK with decompressing m strings where m << n then the BGZF addition to
gzip would work for you. In brief, BGZF blocks gzip into 64kb chunks
which can be indexed.


The spec for BGZF is here [1] (section 4 from page 11 on) and there is a
BGZF implementation here [2] and example indexing here [3] (the indexing
would need to be modified for your use case since I have written it for
genomic data).

[1]https://samtools.github.io/hts-specs/SAMv1.pdf
[2]https://godoc.org/github.com/biogo/hts/bgzf
[3]https://godoc.org/github.com/biogo/hts/tabix

On Wed, 2016-08-10 at 22:27 +0000, Alex Flint wrote:
> I have long list of short strings that I want to compress, but I want
> to be
> able to decompress an arbitrary string in the list at any time without
> decompressing the entire list.
> 
> I know the list ahead of time and it doesn't matter how much
> preprocessing
> time is involved. It is also fine if there is some significant O(1)
> memory
> overhead at runtime.
> 
> Any suggestions?


-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [go-nuts] compressing long list of short strings

Reply via email to