Scott, On Sun, 4 May 2008, Scott wrote:
>> Hi... >> >> Here's my system: >> >> 2 Intel 3 Ghz 5160 dual-core cpu's >> 10 SATA 750 GB disks running as a ZFS RAIDZ2 pool >> 8 GB Memory >> SunOS 5.11 snv_79a on a separate UFS mirror >> ZFS pool version 10 >> No separate ZIL or ARC cache >> ran into a problem today where the ZFS pool jammed >> for an extended >> eriod of time. During that time, it seemed >> read-bound doing only read >> I/O's (as observed with "zpool iostat 1") and I saw >> 100% misses while >> running arcstat.pl (for "miss%", "dm%", "pm%" and >> "mm%"). Processes >> accessing the pool were jammed, including remote NFS >> mounts. At the time, >> I was: 1) running a scrub, 2) writing 10's of MB/sec >> of data onto the pool >> as well as reading from the pool, 3) was deleting a >> large number of files >> on the pool. I tried killing one of the jammed "rm" >> processes and it >> eventually died. The # of misses seen in arcstat.pl >> eventually dropped >> back down to the 20-40% range ("miss%"). A while >> later, writes began >> occuring to the pool again and remote NFS access also >> freed up and overall >> system behaviour seemed to normalize. This all >> occurred over the course >> of approximately an hour. >> >> Does this kind of problem sound familiar to anyone? >> Is it a ZFS problem, >> r have I hit some sort of ZFS load maximum and this >> is the response? >> Any suggestions for ways to avoid this are welcome... >> >> Thanks... >> Art >> thur A. Person >> Research Assistant, System Administrator >> Penn State Department of Meteorology >> email: [EMAIL PROTECTED], phone: 814-863-1563 >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss@opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discu >> ss > > Hi Art, > > I have seen a similar problem that is happening on several servers since > a recent upgrade from b70 to b86/b87. For no obvious reason, the > servers will stop writing to the pool for long periods of time. > Watching a "zpool iostat", I can see that 0 writes are being done for up > to a minute at a time. Meanwhile, a large amount of small (~3K) reads > are happening. The servers behave like this for an hour or more at a > time. > > The server configuration is: > Dual-core Opteron 2212HE > 4GB ECC DDR2 RAM > 15 1TB SATA drives in a RAID-Z2 pool > 2 Supermicro SAT2-MV8 controllers > SunOS 5.11 snv_86 > UFS root and swap are on their own disk > > Have you made any progress with this problem? Has anyone else seen this > behavior? I haven't seen it happen again, but I haven't hammered the system as I did above to try and make it fail either. Since then, I have also added two RiDATA 16GB SSD's, one as a log device and one as a cache device to see if I can improve performance into/out-of the array. Writing data to the array has definitely improved with the log device, but I'm still having performance issues reading large numbers of small files off the array. I'm curious about your array configuration above... did you create your RAIDZ2 as one vdev or multiple vdev's? If multiple, how many? On mine, I have all 10 disks set up as one RAIDZ2 vdev which is supposed to be near the performance limit... I'm wondering how much I would gain by splitting it into two vdev's for the price of losing 1.5TB (2 disks) worth of storage. Art Arthur A. Person Research Assistant, System Administrator Penn State Department of Meteorology email: [EMAIL PROTECTED], phone: 814-863-1563 _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss