So....granted, tank is about 77% full (not to split hairs ;^), but in this case, 23% is 640GB of free space. I mean, it's not like 15 years ago when a file system was 2GB total, and 23% free meant a measely 460MB to allocate from.
640GB is a lot of space, and our largest writes are less than 5MB. I would hope we're not tripping over 6593293 yet. But I guess it's worth looking at. We should be able to figure this out with a DTrace script. Two DTrace scripts from CR6495013 should help us determine if this is the case (not 6495013 was fixed in s10u4 and nv61). -------------------- metaslab.d -------------------------------------- #pragma D option quiet BEGIN { self->in_metaslab = 0; } fbt::metaslab_ff_alloc:entry /self->in_metaslab == 0/ { self->in_metaslab = 1; self->loopcount = 0; } fbt::avl_walk:entry /self->in_metaslab/ { self->loopcount++; } fbt::metaslab_ff_alloc:return /self->in_metaslab/ { self->in_metaslab = 0; @loops["Loops count"] = quantize(self->loopcount); self->loopcount = 0; } ----- metaslab_size.d ------------------------- #!/usr/sbin/dtrace -s #pragma D option quiet BEGIN { self->in_metaslab = 0; } fbt::metaslab_ff_alloc:entry /self->in_metaslab == 0/ { self->in_metaslab = 1; @sizes["metaslab sizes"] = quantize(arg1); } fbt::metaslab_ff_alloc:return /self->in_metaslab/ { self->in_metaslab = 0; } Sanjeev wrote: > Kevin, > > Looking at the stats I think the tank pool is about 80% full. > And at this point you are possibly hitting the bug : > 6596237 - "Stop looking and start ganging > > Also, there is another ZIL related bug which worsens the case > by fragmenting the space : > 6683293 concurrent O_DSYNC writes to a fileset can be much improved over NFS > > You could compare the disk usage of the other machine that you have. > > Also, it would be useful to know what patch levels you are running. > > We do have IDRs for the bug#6596237 and the other bug has been > fixed in the official patches. > > Hope that helps. > > Thanks and regards, > Sanjeev. > > On Thu, Jan 29, 2009 at 01:13:29PM +0100, Kevin Maguire wrote: > >> Hi >> >> We have been using a Solaris 10 system (Sun-Fire-V245) for a while as >> our primary file server. This is based on Solaris 10 06/06, plus >> patches up to approx May 2007. It is a production machine, and until >> about a week ago has had few problems. >> >> Attached to the V245 is a SCSI RAID array, which presents one LUN to >> the OS. On this lun is a zpool (tank), and within that 300+ zfs file >> systems (one per user for automounted home directories). The system is >> connected to our LAN via gigabit Ethernet,. most of our NFS clients >> have just 100FD network connection. >> >> In recent days performance of the file server seems to have gone off a >> cliff. I don't know how to troubleshoot what might be wrong? Typical >> "zpool iostat 120" output is shown below. If I run "truss -D df" I see >> each call to statvfs64("/tank/bla) takes 2-3 seconds. The RAID itself >> is healthy, and all disks are reporting as OK. >> >> I have tried to establish if some client or clients are thrashing the >> server via nfslogd, but without seeing anything obvious. Is there >> some kind of per-zfs-filesystem iostat? >> >> End users are reporting just saving small files can take 5-30 seconds? >> prstat/top shows no process using significant CPU load. The system >> has 8GB of RAM, vmstat shows nothing interesting. >> >> I have another V245, with the same SCSI/RAID/zfs setup, and a similar >> (though a bit less) load of data and users where this problem is NOT >> apparent there? >> >> Suggestions? >> Kevin >> >> Thu Jan 29 11:32:29 CET 2009 >> capacity operations bandwidth >> pool used avail read write read write >> ---------- ----- ----- ----- ----- ----- ----- >> tank 2.09T 640G 10 66 825K 1.89M >> tank 2.09T 640G 39 5 4.80M 126K >> tank 2.09T 640G 38 8 4.73M 191K >> tank 2.09T 640G 40 5 4.79M 126K >> tank 2.09T 640G 39 5 4.73M 170K >> tank 2.09T 640G 40 3 4.88M 43.8K >> tank 2.09T 640G 40 3 4.87M 54.7K >> tank 2.09T 640G 39 4 4.81M 111K >> tank 2.09T 640G 39 9 4.78M 134K >> tank 2.09T 640G 37 5 4.61M 313K >> tank 2.09T 640G 39 3 4.89M 32.8K >> tank 2.09T 640G 35 7 4.31M 629K >> tank 2.09T 640G 28 13 3.47M 1.43M >> tank 2.09T 640G 5 51 433K 4.27M >> tank 2.09T 640G 6 51 450K 4.23M >> tank 2.09T 639G 5 52 543K 4.23M >> tank 2.09T 640G 26 57 3.00M 1.15M >> tank 2.09T 640G 39 6 4.82M 107K >> tank 2.09T 640G 39 3 4.80M 119K >> tank 2.09T 640G 38 8 4.64M 295K >> tank 2.09T 640G 40 7 4.82M 102K >> tank 2.09T 640G 43 5 4.79M 103K >> tank 2.09T 640G 39 4 4.73M 193K >> tank 2.09T 640G 39 5 4.87M 62.1K >> tank 2.09T 640G 40 3 4.88M 49.3K >> tank 2.09T 640G 40 3 4.80M 122K >> tank 2.09T 640G 42 4 4.83M 82.0K >> tank 2.09T 640G 40 3 4.89M 42.0K >> ... >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss@opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss