On Tue, Jan 23, 2007 at 01:35:35PM +0100, Jonas Thambert wrote:
> I'm using a Adaptec 2010S SCSI RAID card. I have tried
> and tweaked the courier imap server the best I can
> without any luck.
...
> The sd1 disk has 140 t/s. CPU-load is nothing.

And "sd1" is actually a RAID array of some sort, rather than a single disk?

My guess is that 140 tps is a fundamental limit of your RAID array,
especially if you are running RAID 5. Try turning off your IMAP server and
running a benchmark like bonnie++ (in ports) to establish this.

An IMAP server generates lots of random file reads and writes, with a
relatively high proportion of file creations, writes and deletions.

Many people don't seem to realise that a RAID 5 array has far *worse* write
performance than a single disk. In a basic RAID 5 array, a single 'write
block' operation actually takes 4 disk transactions across 2 disks:

1a. read the old data block
1b. read the old parity block
2. calculate the new parity (= old parity ^ old data ^ new data)
3a. write the new data block
3b. write the new parity block

1a and 1b can take place concurrently on the two disks, but step 2 can't be
done until both 1a and 1b are complete. Steps 3a and 3b can be speeded up by
writing via a battery-backed cache, but the rest is laws-of-physics stuff.

Some solutions you can consider are:

(1) Use mirroring (RAID 1) instead of RAID 5, since disks are cheap.

With RAID 1, a write operation simply has to write the same data block to
both disks, which happens concurrently. You also get double the number of
read operations per second, since you have two copies of the data, so one
client can be searching for a block while a second client searches for
another block on the other disk.

(2) Use a filesystem which intrinsically coalesces writes. The best example
I can cite is the Network Appliance WAFL filesystem. NetApps give extremely
good performance but are very expensive (although worth it IMO)

Sun's ZFS looks to be an upcoming contender in this space; building a
fileserver using OpenSolaris + ZFS + NFS is an option, or the FreeBSD port
of ZFS is nearing completion. No option for OpenBSD that I'm aware of though
:-(

(3) Divide your users' mail directories across multiple disks or RAID sets,
either with a database, or even symlinks (e.g. /var/mail/0-7 are symlinked
to one disk, /var/mail/8-f are symlinked to another)

This is better than striping IMO. For example, if you have six disks, I'd
recommend three mirrored pairs mounted on /mail1, /mail2 and /mail3, rather
than striping-over-mirroring or vice versa. Then if you lose a pair of
disks, at least 2/3rd of your mail is unaffected.

(4) Off-load other disk operations to another disk.

Now, you don't say much about your IMAP cluster, but presumably it receives
incoming mail using SMTP or LMTP. This means you have an MTA (e.g. sendmail,
postfix, exim etc) which accepts the mail.

This MTA will need its own spool directory where it stores a copy of each
incoming message until it has been successfully delivered into its final
place. Put this on a different disk to speed things up (you can get away
with a single disk, if you are prepared to accept the small risk of a
handful of messages being lost if this disk fails). The spool directory is
often a heavy offender because for each incoming message there will be a 
  create - write - sync - read - delete
sequence of operations.

For additional performance, Exim has the ability to split its spool
directory across a number of subdirectories, which you can symlink to
multiple disks.

Putting the MTA spool directory on a battery-backed RAM disk is best of all.

At very least, separating things out this way will make it clear in the tps
figures how much is due to the MTA spooling and how much due to operations
in the users' mailstores.

HTH,

Brian.

Reply via email to