On 6.2.2010, at 3.03, Matt Reimer wrote:

> Isn't the real difference even smaller?
> 
> 15120350/28311552 = .534
> 16517320/28311552 = .583
> 
> So that's just under 5%.

Well, sure, if you're comparing it to uncompressed data. :) But I think it made 
more sense to compare the two compression possibilities.

> Either way, I'd say go with compressing each mail individually for quick
> seeking.

Maybe.. but if the I/O times dominated by disk seeks, it probably wouldn't make 
much of a difference if it reads 2 MB or a few kB from the file. Then there's 
also the extra latency and CPU usage from uncompression, but perhaps that 
wouldn't be all that much either. And it would be even lower if the file sizes 
were set smaller, like 200 kB.

But then of course with SSDs the I/O isn't dominated by seeks, so maybe this 
makes less sense there..

> Also, if you were compressing the whole file of mails as a single stream,
> wouldn't you have to recompress and rewrite the whole file for each new mail
> delivered?

I was thinking that the compression would be delayed so that it would be done 
only after mdbox already decided that it wouldn't write any more data to it. 
But it's actually possible to append more data to .gz files (the compression 
wouldn't be any better then though).

Reply via email to