On Thursday, 11 August 2016 02:24:49 UTC+3, Alex Flint wrote: > > There are around 2M strings, and their total size is ~6 GB, so an average > of 3k each. >
What kind of data? How large is the alphabet? What is the distribution of letters? Examples would be good :) > > I actually looked briefly at Go's compress/flate to see whether something > like what you're describing is possible without writing my own compressor > but I couldn't see any obvious way to get at the underlying compressor > state. Or perhaps I'm looking in the wrong package - any pointers would be > appreciated. > > On Wed, Aug 10, 2016 at 3:42 PM Ian Lance Taylor <ia...@golang.org > <javascript:>> wrote: > >> On Wed, Aug 10, 2016 at 3:27 PM, Alex Flint <alex....@gmail.com >> <javascript:>> wrote: >> > >> > I have long list of short strings that I want to compress, but I want >> to be >> > able to decompress an arbitrary string in the list at any time without >> > decompressing the entire list. >> > >> > I know the list ahead of time and it doesn't matter how much >> preprocessing >> > time is involved. It is also fine if there is some significant O(1) >> memory >> > overhead at runtime. >> > >> > Any suggestions? >> >> You say the strings are "short": how short? How many strings are >> there? How much total data in the uncompressed strings? >> >> What is your target for the total amount of memory used by the >> compressed strings plus any data required to decompress them? >> >> One approach that comes to mind is building an optimized Huffman table >> for the full set of strings, and compressing each one separately using >> that table. Then each string is represented by a bit offset into the >> resulting bitstream, and each can be decompressed separately. But you >> would need storage at run time not only for the bitstream, but also >> for the Huffman table. >> >> Ian >> > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.