Hi Charles, You might want rule out hardware issues first...
You can review iostat -En or the /var/adm/messages file to see if any driver related error messages are related to the hangs, like this: c4t40d0 Soft Errors: 7 Hard Errors: 0 Transport Errors: 0 Vendor: SUN Product: StorEdge 3510 Revision: 327P Serial No: Size: 48.20GB <48201990144 bytes> In addition, FMA will report disk issues in fmdump output. For example, you could grep for some of the devices in the pool like this: # fmdump -eV | grep c1t9d0 vdev_path = /dev/dsk/c1t9d0s0 vdev_path = /dev/dsk/c1t9d0s0 If you get output like the above, then take a closer look at the fmdump -eV output to see what is happening at the disk level. Thanks, Cindy On 08/30/10 10:02, Charles J. Knipe wrote:
Howdy, We're having a ZFS performance issue over here that I was hoping you guys could help me troubleshoot. We have a ZFS pool made up of 24 disks, arranged into 7 raid-z devices of 4 disks each. We're using it as an iSCSI back-end for VMWare and some Oracle RAC clusters. Under normal circumstances performance is very good both in benchmarks and under real-world use. Every couple days, however, I/O seems to hang for anywhere between several seconds and several minutes. The hang seems to be a complete stop of all write I/O. The following zpool iostat illustrates: pool0 2.47T 5.13T 120 0 293K 0 pool0 2.47T 5.13T 127 0 308K 0 pool0 2.47T 5.13T 131 0 322K 0 pool0 2.47T 5.13T 144 0 347K 0 pool0 2.47T 5.13T 135 0 331K 0 pool0 2.47T 5.13T 122 0 295K 0 pool0 2.47T 5.13T 135 0 330K 0 While this is going on our VMs all hang, as do any "zfs create" commands or attempts to touch/create files in the zfs pool from the local system. After several minutes the system "un-hangs" and we see very high write rates before things return to normal across the board. Some more information about our configuration: We're running OpenSolaris svn-134. ZFS is at version 22. Our disks are 15kRPM 300gb Seagate Cheetahs, mounted in Promise J610S Dual enclosures, hanging off a Dell SAS 5/e controller. We'd tried out most of this configuration previously on OpenSolaris 2009.06 without running into this problem. The only thing that's new, aside from the newer OpenSolaris/ZFS is a set of four SSDs configured as log disks. At first we blamed de-dupe, but we've disabled that. Next we suspected the SSD log disks, but we've seen the problem with those removed, as well. Has anyone seen anything like this before? Are there any tools we can use to gather information during the hang which might be useful in determining what's going wrong? Thanks for any insights you may have. -Charles
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss