Re: [zfs-discuss] Intermittent ZFS hang

Cindy Swearingen Tue, 31 Aug 2010 10:00:29 -0700

Hi Charles,

You might want rule out hardware issues first...


You can review iostat -En or the /var/adm/messages file to see if any
driver related error messages are related to the hangs, like this:

c4t40d0          Soft Errors: 7 Hard Errors: 0 Transport Errors: 0
Vendor: SUN      Product: StorEdge 3510    Revision: 327P Serial No:
Size: 48.20GB <48201990144 bytes>

In addition, FMA will report disk issues in fmdump output.

For example, you could grep for some of the devices in the pool
like this:

# fmdump -eV | grep c1t9d0
        vdev_path = /dev/dsk/c1t9d0s0
        vdev_path = /dev/dsk/c1t9d0s0


If you get output like the above, then take a closer look at the fmdump
-eV output to see what is happening at the disk level.

Thanks,

Cindy


On 08/30/10 10:02, Charles J. Knipe wrote:

Howdy,

We're having a ZFS performance issue over here that I was hoping you guys could
help me troubleshoot. We have a ZFS pool made up of 24 disks, arranged into 7
raid-z devices of 4 disks each. We're using it as an iSCSI back-end for VMWare
and some Oracle RAC clusters.

Under normal circumstances performance is very good both in benchmarks and
under real-world use. Every couple days, however, I/O seems to hang for
anywhere between several seconds and several minutes. The hang seems to be a
complete stop of all write I/O. The following zpool iostat illustrates:

pool0 2.47T 5.13T 120 0 293K 0
pool0 2.47T 5.13T 127 0 308K 0
pool0 2.47T 5.13T 131 0 322K 0
pool0 2.47T 5.13T 144 0 347K 0
pool0 2.47T 5.13T 135 0 331K 0
pool0 2.47T 5.13T 122 0 295K 0
pool0 2.47T 5.13T 135 0 330K 0

While this is going on our VMs all hang, as do any "zfs create" commands or attempts to
touch/create files in the zfs pool from the local system. After several minutes the system
"un-hangs" and we see very high write rates before things return to normal across the
board.

Some more information about our configuration: We're running OpenSolaris
svn-134. ZFS is at version 22. Our disks are 15kRPM 300gb Seagate Cheetahs,
mounted in Promise J610S Dual enclosures, hanging off a Dell SAS 5/e
controller. We'd tried out most of this configuration previously on
OpenSolaris 2009.06 without running into this problem. The only thing that's
new, aside from the newer OpenSolaris/ZFS is a set of four SSDs configured as
log disks.

At first we blamed de-dupe, but we've disabled that. Next we suspected the SSD
log disks, but we've seen the problem with those removed, as well.

Has anyone seen anything like this before? Are there any tools we can use to
gather information during the hang which might be useful in determining what's
going wrong?

Thanks for any insights you may have.

-Charles


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Intermittent ZFS hang

Reply via email to