There are around 2M strings, and their total size is ~6 GB, so an average
of 3k each.

I actually looked briefly at Go's compress/flate to see whether something
like what you're describing is possible without writing my own compressor
but I couldn't see any obvious way to get at the underlying compressor
state. Or perhaps I'm looking in the wrong package - any pointers would be
appreciated.

On Wed, Aug 10, 2016 at 3:42 PM Ian Lance Taylor <i...@golang.org> wrote:

> On Wed, Aug 10, 2016 at 3:27 PM, Alex Flint <alex.fl...@gmail.com> wrote:
> >
> > I have long list of short strings that I want to compress, but I want to
> be
> > able to decompress an arbitrary string in the list at any time without
> > decompressing the entire list.
> >
> > I know the list ahead of time and it doesn't matter how much
> preprocessing
> > time is involved. It is also fine if there is some significant O(1)
> memory
> > overhead at runtime.
> >
> > Any suggestions?
>
> You say the strings are "short": how short?  How many strings are
> there?  How much total data in the uncompressed strings?
>
> What is your target for the total amount of memory used by the
> compressed strings plus any data required to decompress them?
>
> One approach that comes to mind is building an optimized Huffman table
> for the full set of strings, and compressing each one separately using
> that table.  Then each string is represented by a bit offset into the
> resulting bitstream, and each can be decompressed separately.  But you
> would need storage at run time not only for the bitstream, but also
> for the Huffman table.
>
> Ian
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to