Kevin, Looking at the stats I think the tank pool is about 80% full. And at this point you are possibly hitting the bug : 6596237 - "Stop looking and start ganging
Also, there is another ZIL related bug which worsens the case by fragmenting the space : 6683293 concurrent O_DSYNC writes to a fileset can be much improved over NFS You could compare the disk usage of the other machine that you have. Also, it would be useful to know what patch levels you are running. We do have IDRs for the bug#6596237 and the other bug has been fixed in the official patches. Hope that helps. Thanks and regards, Sanjeev. On Thu, Jan 29, 2009 at 01:13:29PM +0100, Kevin Maguire wrote: > Hi > > We have been using a Solaris 10 system (Sun-Fire-V245) for a while as > our primary file server. This is based on Solaris 10 06/06, plus > patches up to approx May 2007. It is a production machine, and until > about a week ago has had few problems. > > Attached to the V245 is a SCSI RAID array, which presents one LUN to > the OS. On this lun is a zpool (tank), and within that 300+ zfs file > systems (one per user for automounted home directories). The system is > connected to our LAN via gigabit Ethernet,. most of our NFS clients > have just 100FD network connection. > > In recent days performance of the file server seems to have gone off a > cliff. I don't know how to troubleshoot what might be wrong? Typical > "zpool iostat 120" output is shown below. If I run "truss -D df" I see > each call to statvfs64("/tank/bla) takes 2-3 seconds. The RAID itself > is healthy, and all disks are reporting as OK. > > I have tried to establish if some client or clients are thrashing the > server via nfslogd, but without seeing anything obvious. Is there > some kind of per-zfs-filesystem iostat? > > End users are reporting just saving small files can take 5-30 seconds? > prstat/top shows no process using significant CPU load. The system > has 8GB of RAM, vmstat shows nothing interesting. > > I have another V245, with the same SCSI/RAID/zfs setup, and a similar > (though a bit less) load of data and users where this problem is NOT > apparent there? > > Suggestions? > Kevin > > Thu Jan 29 11:32:29 CET 2009 > capacity operations bandwidth > pool used avail read write read write > ---------- ----- ----- ----- ----- ----- ----- > tank 2.09T 640G 10 66 825K 1.89M > tank 2.09T 640G 39 5 4.80M 126K > tank 2.09T 640G 38 8 4.73M 191K > tank 2.09T 640G 40 5 4.79M 126K > tank 2.09T 640G 39 5 4.73M 170K > tank 2.09T 640G 40 3 4.88M 43.8K > tank 2.09T 640G 40 3 4.87M 54.7K > tank 2.09T 640G 39 4 4.81M 111K > tank 2.09T 640G 39 9 4.78M 134K > tank 2.09T 640G 37 5 4.61M 313K > tank 2.09T 640G 39 3 4.89M 32.8K > tank 2.09T 640G 35 7 4.31M 629K > tank 2.09T 640G 28 13 3.47M 1.43M > tank 2.09T 640G 5 51 433K 4.27M > tank 2.09T 640G 6 51 450K 4.23M > tank 2.09T 639G 5 52 543K 4.23M > tank 2.09T 640G 26 57 3.00M 1.15M > tank 2.09T 640G 39 6 4.82M 107K > tank 2.09T 640G 39 3 4.80M 119K > tank 2.09T 640G 38 8 4.64M 295K > tank 2.09T 640G 40 7 4.82M 102K > tank 2.09T 640G 43 5 4.79M 103K > tank 2.09T 640G 39 4 4.73M 193K > tank 2.09T 640G 39 5 4.87M 62.1K > tank 2.09T 640G 40 3 4.88M 49.3K > tank 2.09T 640G 40 3 4.80M 122K > tank 2.09T 640G 42 4 4.83M 82.0K > tank 2.09T 640G 40 3 4.89M 42.0K > ... > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss