There are around 2M strings, and their total size is ~6 GB, so an average of 3k each.
I actually looked briefly at Go's compress/flate to see whether something like what you're describing is possible without writing my own compressor but I couldn't see any obvious way to get at the underlying compressor state. Or perhaps I'm looking in the wrong package - any pointers would be appreciated. On Wed, Aug 10, 2016 at 3:42 PM Ian Lance Taylor <i...@golang.org> wrote: > On Wed, Aug 10, 2016 at 3:27 PM, Alex Flint <alex.fl...@gmail.com> wrote: > > > > I have long list of short strings that I want to compress, but I want to > be > > able to decompress an arbitrary string in the list at any time without > > decompressing the entire list. > > > > I know the list ahead of time and it doesn't matter how much > preprocessing > > time is involved. It is also fine if there is some significant O(1) > memory > > overhead at runtime. > > > > Any suggestions? > > You say the strings are "short": how short? How many strings are > there? How much total data in the uncompressed strings? > > What is your target for the total amount of memory used by the > compressed strings plus any data required to decompress them? > > One approach that comes to mind is building an optimized Huffman table > for the full set of strings, and compressing each one separately using > that table. Then each string is represented by a bit offset into the > resulting bitstream, and each can be decompressed separately. But you > would need storage at run time not only for the bitstream, but also > for the Huffman table. > > Ian > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.