Steve Howell <showel...@yahoo.com> writes: > My test was to write roughly 4GB of data, with 2 million keys of 2k > bytes each.
If the records are something like english text, you can compress them with zlib and get some compression gain by pre-initializing a zlib dictionary from a fixed english corpus, then cloning it. That is, if your messages are a couple paragraphs, you might say something like: iv = (some fixed 20k or so of records concatenated together) compressor = zlib(iv).clone() # I forget what this # operation is actually called # I forget what this is called too, but the idea is you throw # away the output of compressing the fixed text, and sync # to a byte boundary compressor.sync() zout = compressor.compress(your_record).sync() ... i.e. the part you save in the file is just the difference between compress(corpus) and compress(corpus_record). To decompress, you initialize a compressor the same way, etc. It's been a while since I used that trick but for json records of a few hundred bytes, I remember getting around 2:1 compression, while starting with an unprepared compressor gave almost no compression. -- http://mail.python.org/mailman/listinfo/python-list