Just to be clear, I'm still *NOT* saying that mbox is inherently better than mbox. That said...
On Sat, Dec 26, 2009 at 02:25:01PM -0600, Derek Martin wrote: > Despite this, it is interesting to note that UW-IMAP mostly > outperforms Courier on low-end hardware *by a lot*, with the sole > exception of the very special case of expunge (which the study calls > delete), whereas on high-end hardware, Courier wins by only a small > margin. Well, I wasn't looking at the graphs closely enough... this is not true. But that doesn't take away from my other points. To add to those, a third point about the analysis which I neglected to mention is that it doesn't take into account that Courier may have a much more efficient implementation of IMAP and other underlying code. This ties in to my second point, and you can summarize those by saying that the analysis fails to attribute when differences in performance are caused by the folder format, and when they are caused by some other implementation details. Having looked a little at UW's implementation in the past (though not recently enough to be certain or to explain why), my guess is Courier is more efficient generally. If you look at the Phase I & II graph for select.1 and select.2, there seems to be some evidence: on low end hardware, UW-IMAP wins by 20s on mailboxes of 10,000 messages on the first pass. On the second pass, Courier takes almost constant time to process mailboxes of any tested size, while UW's performance slope is considerably higher. Processing 100 messages appears to take the same amount of time for UW on both passes. Processing 10,000 messages takes roughly 9x longer on UW than on Courier on the second pass. So, either Courier is caching internally and UW is relying on file system caching (yeilding worse performance), or they both cache internally and Courier's caching code is good, while UW's caching code is crap. Something of the sort *must* be true: if both were using efficient caching, their performance should be roughly identical for the second pass, since neither one would need to read from the disk; in other words, for the second pass, the message store's on-disk format should not come into play *at all*. This factor also shows itself if you look at the way he calculates CPU usage. He's taking the average of user+sys and real. Look at the results for 2,000 messages on high-end hardware. UW's user + sys is about .8s, but its REAL is 3.068s. So, what the hell was it doing for the other 2.2s? Something is fishy here. For Courier, the numbers add up a bit better: 0.030s + 2.120s = 2.150, where the real time is 2.147s. It's odd that the user+sys add up to more than real, but very likely Linux isn't perfectly accurate accounting for CPU time. But what about our 2.2s difference with UW? That's more than enough to be significant. The difference between real and user+sys should be the amount of time the process was sleeping. We have neither access to his test machines nor a time machine, so we can only make guesses about why it was sleeping. Most likely, either UW's code is very inefficient, or the server was swapped out. In neither case, can you blame that on the folder format. He's not wrong that the real CPU time matters when it comes to responsiveness when using imapd, but it very probably is wrong to attribute that loss of time to the mailbox format. We can't really know without a more detailed analysis. Such a detail is suitable for comparing UW-IMAP to Courier, but not to comparing maildir to mbox. So, my take on that analysis is that it's pretty much worthless. For the analysis to be worth anything, it needs to at least eliminate IMAP from the picture, and take a stab at analyzing the efficiency of the implementation. You'd need to write client code that uses the mailbox drivers of both, but uses the same code to feed the drivers, and then compare any inefficiencies and optimizations in the implementations of each driver. You'd need to eliminate caching, since caching eliminates or reduces the need to actually perform I/O on the mail store. Then, and only then, you'd have something worth talking about. And again, I'm not saying that his conclusions are wrong; I'm just saying that his analysis is bunk. =8^) -- Derek D. Martin http://www.pizzashack.org/ GPG Key ID: 0xDFBEAD02 -=-=-=-=- This message is posted from an invalid address. Replying to it will result in undeliverable mail due to spam prevention. Sorry for the inconvenience.
pgpIozVO6EAHB.pgp
Description: PGP signature