On Tue, Jan 23, 2007 at 01:35:35PM +0100, Jonas Thambert wrote: > I'm using a Adaptec 2010S SCSI RAID card. I have tried > and tweaked the courier imap server the best I can > without any luck. ... > The sd1 disk has 140 t/s. CPU-load is nothing.
And "sd1" is actually a RAID array of some sort, rather than a single disk? My guess is that 140 tps is a fundamental limit of your RAID array, especially if you are running RAID 5. Try turning off your IMAP server and running a benchmark like bonnie++ (in ports) to establish this. An IMAP server generates lots of random file reads and writes, with a relatively high proportion of file creations, writes and deletions. Many people don't seem to realise that a RAID 5 array has far *worse* write performance than a single disk. In a basic RAID 5 array, a single 'write block' operation actually takes 4 disk transactions across 2 disks: 1a. read the old data block 1b. read the old parity block 2. calculate the new parity (= old parity ^ old data ^ new data) 3a. write the new data block 3b. write the new parity block 1a and 1b can take place concurrently on the two disks, but step 2 can't be done until both 1a and 1b are complete. Steps 3a and 3b can be speeded up by writing via a battery-backed cache, but the rest is laws-of-physics stuff. Some solutions you can consider are: (1) Use mirroring (RAID 1) instead of RAID 5, since disks are cheap. With RAID 1, a write operation simply has to write the same data block to both disks, which happens concurrently. You also get double the number of read operations per second, since you have two copies of the data, so one client can be searching for a block while a second client searches for another block on the other disk. (2) Use a filesystem which intrinsically coalesces writes. The best example I can cite is the Network Appliance WAFL filesystem. NetApps give extremely good performance but are very expensive (although worth it IMO) Sun's ZFS looks to be an upcoming contender in this space; building a fileserver using OpenSolaris + ZFS + NFS is an option, or the FreeBSD port of ZFS is nearing completion. No option for OpenBSD that I'm aware of though :-( (3) Divide your users' mail directories across multiple disks or RAID sets, either with a database, or even symlinks (e.g. /var/mail/0-7 are symlinked to one disk, /var/mail/8-f are symlinked to another) This is better than striping IMO. For example, if you have six disks, I'd recommend three mirrored pairs mounted on /mail1, /mail2 and /mail3, rather than striping-over-mirroring or vice versa. Then if you lose a pair of disks, at least 2/3rd of your mail is unaffected. (4) Off-load other disk operations to another disk. Now, you don't say much about your IMAP cluster, but presumably it receives incoming mail using SMTP or LMTP. This means you have an MTA (e.g. sendmail, postfix, exim etc) which accepts the mail. This MTA will need its own spool directory where it stores a copy of each incoming message until it has been successfully delivered into its final place. Put this on a different disk to speed things up (you can get away with a single disk, if you are prepared to accept the small risk of a handful of messages being lost if this disk fails). The spool directory is often a heavy offender because for each incoming message there will be a create - write - sync - read - delete sequence of operations. For additional performance, Exim has the ability to split its spool directory across a number of subdirectories, which you can symlink to multiple disks. Putting the MTA spool directory on a battery-backed RAM disk is best of all. At very least, separating things out this way will make it clear in the tps figures how much is due to the MTA spooling and how much due to operations in the users' mailstores. HTH, Brian.