On Fri, Dec 25, 2009 at 08:41:01AM -0600, Kyle Wheeler wrote: > But if you're here to rehash the "fast" argument, I think we can't get > anywhere without pointing to the CourierMTA's webpage of mbox/maildir > benchmarks: http://www.courier-mta.org/mbox-vs-maildir/
I'm very familiar with this page, and I consider it fairly useless. First, it has a tendency to focus on operations where courier wins, and somewhat downplays cases where it doesn't. For example, it does no tests at all with extremely large numbers of messages. On typical Unix file systems, maildir basically falls over, because accessing files in large directories is inherently slow (to the point of being painful) on such filesystems. Second, and much more importantly, it assumes that University of Washintgon's mbox implementation is representative of how well mbox is capable of performing, and that Courier's maildir implementation is similarly representative of how well maildir performs. In other words, you're actually comparing two specific implementations -- not mbox vs. maildir per se. That pretty much invalidates every aspect of the conclusions drawn on this page (though they may well be valid for Courier vs. UW-IMAP). Despite this, it is interesting to note that UW-IMAP mostly outperforms Courier on low-end hardware *by a lot*, with the sole exception of the very special case of expunge (which the study calls delete), whereas on high-end hardware, Courier wins by only a small margin. It's been a long while since I looked at UW's implementation, but I do remember thinking that it had a number of opportunities for optimization. I believe, for example, that UW-IMAP's caching was basically nonexistant (which would explain why Courier does so much better on all the .2 tests). When comparing Mutt's implementations of mbox vs. maildir, mbox BLOWS AWAY maildir opening large mailboxes (i.e. pre-header-caching). IIRC UW-IMAP also uses stdio... which, being double-buffered, is the least efficient method of I/O. On reasonably modern (i.e. not broken) implementations, using memory-mapped I/O is substantially faster. For maildir, the difference probably wouldn't matter much since the reads and writes tend to be small. For mbox, that matters a lot (see W. Richard Stevens, Advanced Programming in the Unix Environment, for an example of how drastically MMIO can improve I/O performance). > You only need to open(2) every individual message if you're reading > the whole thing for the first time. You certainly don't need to do > that if you're delivering mail, or deleting mail, or marking a message > as read, or what have you. Yes, exactly. I have dozens of mailboxes, most of which (in my work environment) are high-volume folders... With my usage patterns (especially pre-headercache), the speed of opening mailboxes matters A LOT. FWIW, last I was paying attention, mbox was not receiving the benefits of header caching in Mutt. For my particular usage patterns, this matters much, much more than say, the time it takes to expunge a single message from a large mail box. With my particular usage patterns, the latter case happens pretty much never. Opening large mailboxes happens pretty frequently. As it happens, mbox (on Mutt at least) is actually about the same or faster for almost all of the operations that actually make a difference to my e-mail experience. I tend to keep my busy incoming folders small, and either delete or archive messages from those folders into mbox folders when I'm done processing them. I rarely delete messages from those mbox folders, but I still do open them very frequently to remind myself of whatever's in the messages I saved there. So for me, maildir's huge win deleting messages in large folders is a *complete* non-issue. A good mbox implementation with caching will perform about as well as or even beat maildir handily in almost every other case. For me, using maildir was as much about Mutt's behavior when using it, as it was about performance and safety. With recent improvements from Brendan and/or Rocco, the behavior is no longer sufficiently different that there's really any benefit at all for me to use maildir (I don't keep my mail on network shares of any sort), but there is genuine benefit from using mbox. It may still be true that mbox is not receiving the benefits of hcache, but if so I don't really notice the difference. I still do use both, but it's mostly a remnant of past issues that no longer exist. -- Derek D. Martin http://www.pizzashack.org/ GPG Key ID: 0xDFBEAD02 -=-=-=-=- This message is posted from an invalid address. Replying to it will result in undeliverable mail due to spam prevention. Sorry for the inconvenience.
pgpxSOiYkO5Wh.pgp
Description: PGP signature