Just to be clear, I'm still *NOT* saying that mbox is inherently
better than mbox.  That said...

On Sat, Dec 26, 2009 at 02:25:01PM -0600, Derek Martin wrote:
> Despite this, it is interesting to note that UW-IMAP mostly
> outperforms Courier on low-end hardware *by a lot*, with the sole
> exception of the very special case of expunge (which the study calls
> delete), whereas on high-end hardware, Courier wins by only a small
> margin.

Well, I wasn't looking at the graphs closely enough... this is not
true.  But that doesn't take away from my other points.

To add to those, a third point about the analysis which I neglected to
mention is that it doesn't take into account that Courier may have a
much more efficient implementation of IMAP and other underlying code.
This ties in to my second point, and you can summarize those by saying
that the analysis fails to attribute when differences in performance
are caused by the folder format, and when they are caused by some
other implementation details.  Having looked a little at UW's
implementation in the past (though not recently enough to be certain
or to explain why), my guess is Courier is more efficient generally.

If you look at the Phase I & II graph for select.1 and select.2, there
seems to be some evidence: on low end hardware, UW-IMAP wins by 20s on
mailboxes of 10,000 messages on the first pass.  On the second pass,
Courier takes almost constant time to process mailboxes of any tested
size, while UW's performance slope is considerably higher.  Processing
100 messages appears to take the same amount of time for UW on both
passes.  Processing 10,000 messages takes roughly 9x longer on UW than
on Courier on the second pass.  

So, either Courier is caching internally and UW is relying on file
system caching (yeilding worse performance), or they both cache
internally and Courier's caching code is good, while UW's caching code
is crap.  Something of the sort *must* be true: if both were using
efficient caching, their performance should be roughly identical for
the second pass, since neither one would need to read from the disk;
in other words, for the second pass, the message store's on-disk
format should not come into play *at all*.

This factor also shows itself if you look at the way he calculates
CPU usage.  He's taking the average of user+sys and real.  Look at the
results for 2,000 messages on high-end hardware.  UW's user + sys is
about .8s, but its REAL is 3.068s.  So, what the hell was it doing for
the other 2.2s?  Something is fishy here.  For Courier, the numbers
add up a bit better: 0.030s + 2.120s = 2.150, where the real time is
2.147s.  It's odd that the user+sys add up to more than real, but very
likely Linux isn't perfectly accurate accounting for CPU time.  But
what about our 2.2s difference with UW?  That's more than enough to be
significant.  The difference between real and user+sys should be the
amount of time the process was sleeping.  We have neither access to
his test machines nor a time machine, so we can only make guesses
about why it was sleeping.  Most likely, either UW's code is very
inefficient, or the server was swapped out.  In neither case, can you
blame that on the folder format.

He's not wrong that the real CPU time matters when it comes to
responsiveness when using imapd, but it very probably is wrong to
attribute that loss of time to the mailbox format.  We can't really
know without a more detailed analysis.  Such a detail is suitable for
comparing UW-IMAP to Courier, but not to comparing maildir to mbox.

So, my take on that analysis is that it's pretty much worthless.  For
the analysis to be worth anything, it needs to at least eliminate IMAP
from the picture, and take a stab at analyzing the efficiency of the
implementation.  You'd need to write client code that uses the mailbox
drivers of both, but uses the same code to feed the drivers, and then
compare any inefficiencies and optimizations in the implementations of
each driver. You'd need to eliminate caching, since caching eliminates
or reduces the need to actually perform I/O on the mail store.  Then,
and only then, you'd have something worth talking about.

And again, I'm not saying that his conclusions are wrong; I'm just
saying that his analysis is bunk. =8^)

-- 
Derek D. Martin    http://www.pizzashack.org/   GPG Key ID: 0xDFBEAD02
-=-=-=-=-
This message is posted from an invalid address.  Replying to it will result in
undeliverable mail due to spam prevention.  Sorry for the inconvenience.

Attachment: pgpIozVO6EAHB.pgp
Description: PGP signature

Reply via email to