[perf-discuss cc'd]

On Sat, Apr 18, 2009 at 4:27 PM, Gary Mills <mi...@cc.umanitoba.ca> wrote:
> Many other layers are involved in this server.  We use scsi_vhci for
> redundant I/O paths and Sun's Iscsi initiator to connect to the
> storage on our Netapp filer.  The kernel plays a part as well.  How
> do we determine which layer is responsible for the slow performance?

Have you disabled the nagle algorithm for the iscsi initiator?

http://bugs.opensolaris.org/view_bug.do?bug_id=6772828

Also, you may want to consider doing backups from the NetApp rather
than from the Solaris box.  Assuming all of your LUNs are in the same
volume on the filer, a snapshot should be a crash-consistent image of
the zpool.  You could verify this by making the snapshot rw and trying
to import the snapshotted LUNs on another host.  Anyway, this would
remove the backup-related stress on the T2000.  You can still do
snapshots at the ZFS layer to give you file level restores.  If the
NetApp caught on fire, you would simply need to restore the volume
containing the LUNs (presumably a small collection of large files)
which would go a lot quicker than a large collection of small files.

Since iSCSI is in the mix, you should also be sure that your network
is appropriately tuned.  Assuming that you are using the onboard
e1000g NICs, be sure that none of the "bad" counters are incrementing:

$ kstat -p e1000g | nawk '$0 ~ /err|drop|fail|no/ && $NF != 0'

If this gives any output, there is likely something amiss with your network.

The output from "iostat -xCn 10" could be interesting as well.  If
asvc_t is high (>30?), it means the filer is being slow to respond.
If wsvc_t is frequently non-zero, there is some sort of a bottleneck
that prevents the server from sending requests to the filer.  Perhaps
you have tuned ssd_max_throttle or Solaris has backed off because the
filer said to slow down.  (Assuming that ssd is used with iSCSI LUNs).

What else is happening on the filer when mail gets slow?  That is, are
you experiencing slowness due to a mail peak or due to some research
project that happens to be on the same spindles?  What does the
network look like from the NetApp side?

Are the mail server and the NetApp attached to the same switch, or are
they at opposite ends of the campus?  Is there something between them
that is misbehaving?

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to