Hi Charles,

You might want rule out hardware issues first...

You can review iostat -En or the /var/adm/messages file to see if any
driver related error messages are related to the hangs, like this:

c4t40d0          Soft Errors: 7 Hard Errors: 0 Transport Errors: 0
Vendor: SUN      Product: StorEdge 3510    Revision: 327P Serial No:
Size: 48.20GB <48201990144 bytes>

In addition, FMA will report disk issues in fmdump output.

For example, you could grep for some of the devices in the pool
like this:

# fmdump -eV | grep c1t9d0
        vdev_path = /dev/dsk/c1t9d0s0
        vdev_path = /dev/dsk/c1t9d0s0


If you get output like the above, then take a closer look at the fmdump
-eV output to see what is happening at the disk level.

Thanks,

Cindy


On 08/30/10 10:02, Charles J. Knipe wrote:
Howdy,

We're having a ZFS performance issue over here that I was hoping you guys could 
help me troubleshoot.  We have a ZFS pool made up of 24 disks, arranged into 7 
raid-z devices of 4 disks each.  We're using it as an iSCSI back-end for VMWare 
and some Oracle RAC clusters.

Under normal circumstances performance is very good both in benchmarks and 
under real-world use.  Every couple days, however, I/O seems to hang for 
anywhere between several seconds and several minutes.  The hang seems to be a 
complete stop of all write I/O.  The following zpool iostat illustrates:

pool0       2.47T  5.13T    120      0   293K      0
pool0       2.47T  5.13T    127      0   308K      0
pool0       2.47T  5.13T    131      0   322K      0
pool0       2.47T  5.13T    144      0   347K      0
pool0       2.47T  5.13T    135      0   331K      0
pool0       2.47T  5.13T    122      0   295K      0
pool0       2.47T  5.13T    135      0   330K      0

While this is going on our VMs all hang, as do any "zfs create" commands or attempts to 
touch/create files in the zfs pool from the local system.  After several minutes the system 
"un-hangs" and we see very high write rates before things return to normal across the 
board.

Some more information about our configuration:  We're running OpenSolaris 
svn-134.  ZFS is at version 22.  Our disks are 15kRPM 300gb Seagate Cheetahs, 
mounted in Promise J610S Dual enclosures, hanging off a Dell SAS 5/e 
controller.  We'd tried out most of this configuration previously on 
OpenSolaris 2009.06 without running into this problem.  The only thing that's 
new, aside from the newer OpenSolaris/ZFS is a set of four SSDs configured as 
log disks.

At first we blamed de-dupe, but we've disabled that.  Next we suspected the SSD 
log disks, but we've seen the problem with those removed, as well.

Has anyone seen anything like this before?  Are there any tools we can use to 
gather information during the hang which might be useful in determining what's 
going wrong?

Thanks for any insights you may have.

-Charles

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to