Locking issues on mbox is the reason for my long-lasting love affair with maildir, and it's lasts long years. Ok, the life's lessons are like this, learn something and move on with it ;) even if it's "new old thing". Thank you for pointing that!
What I was doubt about is default rotate size of 2M, since I used to see pretty reasonable default settings in all Dovecot config. 32 or 64 are much close to the ones I'd personally prefer. I also about to choose now is the OS and FS for the archive. I seriously think about ZFS with compression (in fact it will be stripes over couple of mirrors = software equivalent of RAID 10 on SATA drives, with compression on FS level) on FreeBSD, or XFS over LVM on Debian with compression in mdbox itself. I see pros and contras for both, so that's the question to answer! Yours, Alexander > On 11/14/2011 8:35 AM, Alexander Chekalin wrote: >> Timo, Stan, >> >> I've just tested mdbox and find it pretty nice for me, but now I got >> some questions for you: >> >> 1. mdbox uses 'a lot' files (m.1, m.2 ... etc), and the default size if >> 2Mb. Looks like not even every message can fit into such storage >> container volume (nowadays we used to see messages of 20Mb and even >> more). Should I tune it (at least mdbox_rotate_size and >> mdbox_rotate_interval) or its size is on purpose? As for now I store >> each day's messages in separate IMAP folders (mailboxes), which gives me >> 2000-6000 messages and 2-5 Gb (on disk) per folder. > > mdbox_rotate_size of 2MB is too small for your needs. Test 32MB and 64MB. > >> 2. I can use no compression, gz and bz2 - which one will be better for >> storing archive messages? I've just tested mdbox by copying 5800+ msgs >> from maildir to compressed mdbox, and it took exactly the same size (2.8 >> G) in 100+ small m.* files. No good as far. > > bzip2 may give you a little better compression but at the cost of much > lower de/compression speed and higher CPU and memory consumption. gzip > will be faster all around, between 4x-8x, with lower mem usage, but with > less compression resulting in slightly larger file sizes than bzip2. > >> 3. What if I use maildir as I do now but turn on compression, will this >> speed things up? > > No. Maildir performance is limited by the disk head actuator speed, > which is between 150-300 seeks per second depending on your disk (7.2k > vs 15k RPM). Compressing the files doesn't change the seek physics of > the disk drives. You're still reading tens of thousands of files when > doing your searches thus bouncing the heads tens of thousands of times. > > mbox uses a single file, so head speed isn't a factor, as it may only > move a few times when reading an entire mailbox file. Thus, bandwidth > becomes the potential bottleneck. Using compression with large mbox > files can substantially increase search performance as effective > bandwidth is increased by ~4x using gzip and 6x using bzip2. This > assumes you have plenty of excess CPU power. mdbox should see similar > compression speedups if you use file sizes much larger than the 2MB > default. Doing so should keep your IOPS well below the drive's head > saturation point as you're reading only a fraction of the file count > compared to maildir. > >> I'd like to use mdbox as storage but for now it is very new for me and I >> simple afraid what should I do if I'll need to manually fix the storage >> (maildir is really good for that, surely). > > Doveadm handles such tasks pretty well. Just make sure you keep good > backups of your mdbox files. > >> After all, I simple need to speed up the search and restore process in >> archive. > > The only way to accomplish this with maildir is with much bigger, > faster, more expensive storage hardware. And the gain will still be > much less than simply switching to a larger file format such as mbox or > mdbox. > > As with many things some computer technologies come full circle over > time. One of the reasons the creators of the UNIX mbox mail file format > decided upon a single file many decades ago was the horribly limited > seek performance of the slow SCSI disks of that period. Doing something > like the maildir format was simply impossible at that time. In the > early days of the public internet, disk became faster than the average > load and maildir was born to fix the locking and corruption shortcomings > of mbox. > > Today many sites are hitting the seek problem of a few decades ago > because boxes are oversubscribed with users, emails now frequently > contain attachments, everyone is storing more email, and the total > volume of email is a few orders of magnitude greater. > > IIRC, this is one of the reasons Timo created mdbox--to decrease the > massive IOPS load, and thus slow performance, of large maildir stores. > > -- > Stan