Tracked it down to about 3 gvfsd-metadata processes, maybe...can't decide if they were victims or root causes.
Shooting those in the head brought things back; I didn't see how our DCS3700's were buried though, what it appeared to me was that pool i/o was effectively blocked; so I don't know whether the DDRdrives would have had any effect. I would still like to be edumacated on a way to acquire a bit more insight into what the pool was busy waiting for when the spindles were so idle. I have no doubt NFS was suffering, but, my number of threads was not at max, and the system was relatively idle; I just couldn't get anything written to disk in a timely fashion. J On 08 May 13:10, jason matthews wrote: > > > > sounds like it is blocking on NFS :-) > > Ask Chris for a try/buy DDRdrive X1 or whatever the latest > concoction is... it could be life change for you. > > j. > > On 5/8/15 11:32 AM, Joe Hetrick wrote: > >Today I played a bit with set sync=disabled after watching a few f/s write > >IOP's. I can't decide if I've found a particular group of users with a new > >(more abusive) set of jobs; > > > >I'm looking more and more, and I've turned sync off on a handful of > >filesystems that are showing a high number of write I/O, sustained; when > >those systems are bypassing the ZIL, everything is happy. The ZIL devices > >are never in %w, and the pool %b coincides with spindle %b, which is almost > >never higher than 50 or so; and things are streaming nicely. > > > >Does anyone have any dtrace that I could use to poke into just what the pool > >is blocking on when these others are in play? Looking at nfsv3 operations, > >I see a very large number of > >create > >setattr > >write > >modify > >rename > > > >and sometimes remove > >and I'm suspecting these users are doing something silly at HPC scale.. > > > > > >Thanks! > > > >Joe > > > > > >>Hi all, > >> > >> > >> We've recently run into a situation where I'm seeing pool at 90-100 %b, > >> and our ZIL's at 90-100 %w, yet all of the spindles are relatively idle. > >> Furthermore, local I/O is normal, and testing is able to quickly and > >> easily put both pool and spindles in the VDEV into high activity. > >> > >> The system is primarily accessed via NFS (home server for an HPC > >> environment). We've had users to evil things before to cause pain, but, > >> this is most odd, as I would only expect this behavior if we had a faulty > >> device in the pool with high %b (we don't) or if we had some sort of COW > >> related issue; such as being <15% free space or so. In this case, we are > >> less than half full of a 108TB raidz3 pool. > >> > >> latencytop shows a lot of ZFS ZIL Writer latency, but thats to be > >> expected given what I see above. Pool I/O with zpool iostat is > >> normal-ish, and as I said, simple raw writes to the pool show expected > >> performance when done locally. > >> > >> Does anyone have any ideas? > >> > >>Thanks, > >> > >>Joe > >> > >>-- > >>Joe Hetrick > >>perl -e 'print pack(h*,a6865647279636b604269647a616e69647f627e2e65647a0)' > >>BOFH Excuse: doppler effect > >> > >_______________________________________________ > >openindiana-discuss mailing list > >openindiana-discuss@openindiana.org > >http://openindiana.org/mailman/listinfo/openindiana-discuss > > > > > _______________________________________________ > openindiana-discuss mailing list > openindiana-discuss@openindiana.org > http://openindiana.org/mailman/listinfo/openindiana-discuss -- Joe Hetrick perl -e 'print pack(h*,a6865647279636b604269647a616e69647f627e2e65647a0)' BOFH Excuse: old inkjet cartridges emanate barium-based fumes _______________________________________________ openindiana-discuss mailing list openindiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss