The drive (c7t2d0)is bad and should be replaced. The second drive (c7t5d0) is either bad or going bad. This is exactly the kind of problem that can force a Thumper to it knees, ZFS performance is horrific, and as soon as you drop the bad disks things magicly return to normal.
My first recommendation is to pull the SMART data from the disks if you can. I wrote a blog entry about SMART to address exactly the behavior your seeing back in 2008: http://www.cuddletech.com/blog/pivot/entry.php?id=993 Yes, people will claim that SMART data is useless for predicting failures, but in a case like yours you are just looking for data to corroborate a hypothesis. In order to test this condition, "zpool offline..." c7t2d0, which emulated removal. See if performance improves. On Thumpers I'd build a list of "suspect disks" based on 'iostat', like you show, and then correlate the SMART data, and then systematically offline disks to see if it really was the problem. In my experience the only other reason you'll legitimately see really wierd "bottoming out" of IO like this is if you hit the max conncurrent IO limits in ZFS (untill recently that limit was 35), so you'd see actv=35, and then when the device finally processed the IO's the thing would snap back to life. But even in those cases you shouldn't see request times (asvc_t) rise above 200ms. All that to say, replace those disks or at least test it. SSD's won't help, one or more drives are toast. benr. On 5/8/10 9:30 PM, Emily Grettel wrote: > Hi Giovani, > > Thanks for the reply. > > Here's a bit of iostat after uncompressing a 2.4Gb RAR file that has 1 > DWF file that we use. > > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 1.0 13.0 26.0 18.0 0.0 0.0 0.0 0.8 0 1 c7t1d0 > 2.0 5.0 77.0 12.0 2.4 1.0 343.8 142.8 100 100 c7t2d0 > 1.0 16.0 25.5 15.5 0.0 0.0 0.0 0.3 0 0 c7t3d0 > 0.0 10.0 0.0 17.0 0.0 0.0 3.2 1.2 1 1 c7t4d0 > 1.0 12.0 25.5 15.5 0.4 0.1 32.4 10.9 14 14 c7t5d0 > 1.0 15.0 25.5 18.0 0.0 0.0 0.1 0.1 0 0 c0t1d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 0.0 0.0 0.0 2.0 1.0 0.0 0.0 100 100 c7t2d0 > 1.0 0.0 0.5 0.0 0.0 0.0 0.0 0.1 0 0 c7t0d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 5.0 15.0 128.0 18.0 0.0 0.0 0.0 1.8 0 3 c7t1d0 > 1.0 9.0 25.5 18.0 2.0 1.8 199.7 179.4 100 100 c7t2d0 > 3.0 13.0 102.5 14.5 0.0 0.1 0.0 5.2 0 5 c7t3d0 > 3.0 11.0 102.0 16.5 0.0 0.1 2.3 4.2 1 6 c7t4d0 > 1.0 4.0 25.5 2.0 0.4 0.8 71.3 158.9 12 79 c7t5d0 > 5.0 16.0 128.5 19.0 0.0 0.1 0.1 2.6 0 5 c0t1d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 4.0 0.0 2.0 2.0 2.0 496.1 498.0 99 100 c7t2d0 > 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0 100 c7t5d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 7.0 0.0 204.5 0.0 0.0 0.0 0.0 0.2 0 0 c7t1d0 > 1.0 0.0 25.5 0.0 3.0 1.0 2961.6 1000.0 99 100 c7t2d0 > 8.0 0.0 282.0 0.0 0.0 0.0 0.0 0.3 0 0 c7t3d0 > 6.0 0.0 282.5 0.0 0.0 0.0 6.1 2.3 1 1 c7t4d0 > 0.0 3.0 0.0 5.0 0.5 1.0 165.4 333.3 18 100 c7t5d0 > 7.0 0.0 204.5 0.0 0.0 0.0 0.0 1.6 0 1 c0t1d0 > 2.0 2.0 89.0 12.0 0.0 0.0 3.1 6.1 1 2 c3t0d0 > 0.0 2.0 0.0 12.0 0.0 0.0 0.0 0.2 0 0 c3t1d0 > > Sometimes two or more disks are going at 100. How does one solve this > issue if its a firmware bug? I tried looking around for Western > Digital Firmware for WD10EADS but couldn't find any available. > > Would adding an SSD or two help here? > > Thanks, > Em > > ------------------------------------------------------------------------ > Date: Fri, 7 May 2010 14:38:25 -0300 > Subject: Re: [zfs-discuss] ZFS Hard disk buffer at 100% > From: gtirl...@sysdroid.com > To: emilygrettelis...@hotmail.com > CC: zfs-discuss@opensolaris.org > > > On Fri, May 7, 2010 at 8:07 AM, Emily Grettel > <emilygrettelis...@hotmail.com <mailto:emilygrettelis...@hotmail.com>> > wrote: > > Hi, > > I've had my RAIDz volume working well on SNV_131 but it has come > to my attention that there has been some read issues with the > drives. Previously I thought this was a CIFS problem but I'm > noticing that when transfering files or uncompressing some fairly > large 7z (1-2Gb) files (or even smaller rar - 200-300Mb) files > occasionally running iostat will give the b% as 100 for a drive or > two. > > > > That's the percent of time the disk is busy (transactions in progress) > - iostat(1M). > > > > > I have the Western Digital EADS 1TB drives (Green ones) and not > the more expensive black or enterprise drives (our sysadmins fault). > > The pool in question spans 4x 1TB drives. > > What exactly does this mean? Is it a controller problem disk > problem or cable problem? I've got this on commodity hardware as > its only used for a small business with 4-5 staff accessing our > media server. Its using the Intel ICHR SATA controller. I've > already changed the cables, swapped out the odd drive that > exhibted this issue and the only thing I can think of is to buy a > Intel or LSI SATA card. > > The scrub sessions take almost a day and a half now (previously at > most 12hours!) but theres also 70% of space being used (files wise > they're chunky MPG files) or compressed artwork but there are no > errors reported. > > Does anyone have any ideas? > > > You might be maxing out your drives' I/O capacity. That could happen > when ZFS is commting the transactions to disk every 30 seconds but if > %b is constantly high you disks might not be keeping up with the > performance requirements. > > We've had some servers showing high asvc_t times but it turned out to > be a firmware issue in the disk controller. It was very erratic (1-2 > drives out of 24 would show that). > > If you look in the archives, people have sent a few averaged I/O > performance numbers that you could compare to your workload. > > -- > Giovanni > > > ------------------------------------------------------------------------ > Meet local singles online. Browse profiles for FREE! > <http://clk.atdmt.com/NMN/go/150855801/direct/01/> > ------------------------------------------------------------------------ > Find it at CarPoint.com.au New, Used, Demo, Dealer or Private? > <http://clk.atdmt.com/NMN/go/206222968/direct/01/> > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss