On Fri, Feb 5, 2010 at 4:36 PM, Timo Sirainen <t...@iki.fi> wrote: > I was wondering if I should add compression support to mdbox one mail at > a time or one file (~2MB) at a time. The tradeoffs are: > > * one mail at a time allows quickly seeking to wanted mail inside the > file, but it can't compress mails as well > * one file at a time compresses better, but seeking is slow because it > can only be done by uncompressing all the data until the wanted offset > is reached > > I did a quick test for this with 27 MB of my old INBOX mails: > > (note the -b option, so it doesn't count wasted fs space) > mdbox/storage% du -sb . > 15120350 . > > Maildir/cur% du -sb . > 16517320 . > > % echo 1-15120350/16517320|bc -l > .08457606924125705623 > > So, compressed mdboxes take 8.5% less space. This was with regular gzip > compression with default level. With bzip2 -9 compression the difference > was 10%. > > Any thoughts on if 8-10% is significant enough improvement to make > seeking performance worse? Or perhaps I should just implement both > ways.. :) >
Isn't the real difference even smaller? 15120350/28311552 = .534 16517320/28311552 = .583 So that's just under 5%. Either way, I'd say go with compressing each mail individually for quick seeking. Also, if you were compressing the whole file of mails as a single stream, wouldn't you have to recompress and rewrite the whole file for each new mail delivered? Matt