[EMAIL PROTECTED] said: > I think we found the choke point. The silver lining is that it isn't the > T2000 or ZFS. We think it is the new SAN, an Hitachi AMS1000, which has > 7200RPM SATA disks with the cache turned off. This system has a very small > cache, and when we did turn it on for one of the replacement LUNs we saw a > 10x improvement - until the cache filled up about 1 minute later (was using > zpool iostat). Oh well.
We have experience with a T2000 connected to the HDS 9520V, predecessor to the AMS arrays, with SATA drives, and it's likely that your AMS1000 SATA has similar characteristics. I didn't see if you're using Sun's drivers to talk to the SAN/array, but we are using Solaris-10 (and Sun drivers + MPXIO), and since the Hitachi storage isn't automatically recognized (sd/ssd, scsi_vhci), it took a fair amount of tinkering to get parameters adjusted to work well with the HDS storage. The combination that has given us best results with ZFS is: (a) Tell the array to ignore SYNCHRONIZE_CACHE requests from the host. (b) Balance drives within each AMS disk shelf across both array controllers. (c) Set the host's max queue depth to 4 for the SATA LUN's (sd/ssd driver). (d) Set the host's disable_disksort flag (sd/ssd driver) for HDS LUN's. Here's the reference we used for setting the parameters in Solaris-10: http://wikis.sun.com/display/StorageDev/Parameter+Configuration Note that the AMS uses read-after-write verification on SATA drives, so you only have half the IOP's for writes that the drives are capable of handling. We've found that small RAID volumes (e.g. a two-drive mirror) are unbelievably slow, so you'd want to go toward having more drives per RAID group, if possible. Honestly, if I recall correctly what I saw in your "iostat" listings earlier, your situation is not nearly as "bad" as with our older array. You don't seem to be driving those HDS LUN's to the extreme busy states that we have seen on our 9520V. It was not unusual for us to see LUN's at 100% busy, 100% wait, with 35 ops total in the "actv" and "wait" columns, and I don't recall seeing any 100%-busy devices in your logs. But getting the FC queue-depth (max-throttle) setting to match what the array's back-end I/O can handle greatly reduced the long "zpool status" and other I/O-related hangs that we were experiencing. And disabling the host-side FC queue-sorting greatly improved the overall latency of the system when busy. Maybe it'll help yours too. Regards, Marion _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss