We have an IMAP server with ZFS for mailbox storage that has recently
become extremely slow on most weekday mornings and afternoons.  When
one of these incidents happens, the number of processes increases, the
load average increases, but ZFS I/O bandwidth decreases.  Users notice
very slow response to IMAP requests.  On the server, even `ps' becomes
slow.

We've tried a number of things, each of which made an improvement, but
the problem still occurs.  The ZFS ARC size was about 10 GB, but was
diminishing to 1 GB when the server was busy.  In fact, it was
unusable when that happened.  Upgrading memory from 16 GB to 64 GB
certainly made a difference.  The ARC size is always over 30 GB now.
Next, we limited the number of `lmtpd' (local delivery) processes to
64.  With those two changes, the server still became very slow at busy
times, but no longer became unresponsive.  The final change was to
disable ZFS prefetch.  It's not clear if this made an improvement.

The server is a T2000 running Solaris 10.  It's a Cyrus murder back-
end, essentially only an IMAP server.  We did recently upgrade the
front-end, from a 4-CPU SPARC box to a 16-core Intel box with more
memory, also running Solaris 10.  The front-end runs sendmail and
proxies IMAP and POP connections to the back-end, and also forwards
SMTP for local deliveries to the back-end, using LMTP.

Cyrus runs thousands of `imapd' processes, with many `pop3d', and
`lmtpd' processes as well.  This should be an ideal workload for a
Niagara box.  All of these memory-map several moderate-sized
databases, both Berkeley DB and skiplist types, and occasionally
update those databases.  Our EMC Networker client also often runs
during the day, doing backups.  All of the IMAP mailboxes reside on
six ZFS filesystems, using a single 2-TB pool.  It's only 51% occupied
at the moment.

Many other layers are involved in this server.  We use scsi_vhci for
redundant I/O paths and Sun's Iscsi initiator to connect to the
storage on our Netapp filer.  The kernel plays a part as well.  How
do we determine which layer is responsible for the slow performance?

-- 
-Gary Mills-    -Unix Support-    -U of M Academic Computing and Networking-
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to