Observation below... On Feb 4, 2011, at 7:10 PM, Matt Connolly wrote:
> Hi, I have a low-power server with three drives in it, like so: > > > matt@vault:~$ zpool status > pool: rpool > state: ONLINE > scan: resilvered 588M in 0h3m with 0 errors on Fri Jan 7 07:38:06 2011 > config: > > NAME STATE READ WRITE CKSUM > rpool ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > c8t1d0s0 ONLINE 0 0 0 > c8t0d0s0 ONLINE 0 0 0 > cache > c12d0s0 ONLINE 0 0 0 > > errors: No known data errors > > > I'm running netatalk file sharing for mac, and using it as a time machine > backup server for my mac laptop. > > When files are copying to the server, I often see periods of a minute or so > where network traffic stops. I'm convinced that there's some bottleneck in > the storage side of things because when this happens, I can still ping the > machine and if I have an ssh window, open, I can still see output from a > `top` command running smoothly. However, if I try and do anything that > touches disk (eg `ls`) that command stalls. At the time it comes good, > everything comes good, file copies across the network continue, etc. > > If I have a ssh terminal session open and run `iostat -nv 5` I see something > like this: > > > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 1.2 36.0 153.6 4608.0 1.2 0.3 31.9 9.3 16 18 c12d0 > 0.0 113.4 0.0 7446.7 0.8 0.1 7.0 0.5 15 5 c8t0d0 > 0.2 106.4 4.1 7427.8 4.0 0.1 37.8 1.4 93 14 c8t1d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.4 73.2 25.7 9243.0 2.3 0.7 31.6 9.8 34 37 c12d0 > 0.0 226.6 0.0 24860.5 1.6 0.2 7.0 0.9 25 19 c8t0d0 > 0.2 127.6 3.4 12377.6 3.8 0.3 29.7 2.2 91 27 c8t1d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 44.2 0.0 5657.6 1.4 0.4 31.7 9.0 19 20 c12d0 > 0.2 76.0 4.8 9420.8 1.1 0.1 14.2 1.7 12 13 c8t0d0 > 0.0 16.6 0.0 2058.4 9.0 1.0 542.1 60.2 100 100 c8t1d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 0.2 0.0 25.6 0.0 0.0 0.3 2.3 0 0 c12d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t0d0 > 0.0 11.0 0.0 1365.6 9.0 1.0 818.1 90.9 100 100 c8t1d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.2 0.0 0.1 0.0 0.0 0.0 0.1 25.4 0 1 c12d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t0d0 > 0.0 17.6 0.0 2182.4 9.0 1.0 511.3 56.8 100 100 c8t1d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c12d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t0d0 > 0.0 16.6 0.0 2058.4 9.0 1.0 542.1 60.2 100 100 c8t1d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c12d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t0d0 > 0.0 15.8 0.0 1959.2 9.0 1.0 569.6 63.3 100 100 c8t1d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.2 0.0 0.1 0.0 0.0 0.0 0.1 0.1 0 0 c12d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t0d0 > 0.0 17.4 0.0 2157.6 9.0 1.0 517.2 57.4 100 100 c8t1d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c12d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t0d0 > 0.0 18.2 0.0 2256.8 9.0 1.0 494.5 54.9 100 100 c8t1d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c12d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t0d0 > 0.0 14.8 0.0 1835.2 9.0 1.0 608.1 67.5 100 100 c8t1d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.2 0.0 0.1 0.0 0.0 0.0 0.1 0.1 0 0 c12d0 > 0.0 1.4 0.0 0.6 0.0 0.0 0.0 0.2 0 0 c8t0d0 > 0.0 49.0 0.0 6049.6 6.7 0.5 137.6 11.2 100 55 c8t1d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 55.4 0.0 7091.2 1.9 0.6 34.9 9.9 27 28 c12d0 > 0.2 126.0 8.6 9347.7 1.4 0.1 11.4 0.6 20 7 c8t0d0 > 0.0 120.8 0.0 9340.4 4.9 0.2 40.5 1.5 77 18 c8t1d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 1.2 57.0 153.6 7271.2 1.8 0.5 31.0 9.4 26 28 c12d0 > 0.2 108.4 12.8 6498.9 0.3 0.1 2.5 0.6 6 5 c8t0d0 > 0.2 104.8 5.2 6506.8 4.0 0.2 38.2 1.4 67 15 c8t1d0 The queues are building in the HBA (wait, wsvc_t, %w) not at the disk (actv, asvc_t, %b). Changing the disk might not help. Changing the controller might help immensely. > The stall occurs when the drive c8t1d0 is 100% waiting, and doing only slow > i/o, typically writing about 2MB/s. However, the other drive is all zeros... > doing nothing. > > The drives are: > c8t0d0 - Western Digital Green - SATA_____WDC_WD15EARS-00Z_____WD-WMAVU2582242 > c8t1d0 - Samsung Silencer - SATA_____SAMSUNG_HD154UI_______S1XWJDWZ309550 > > > I've installed smartmon and done a short and long test on both drives, all > resulting in no found errors. > smartmon doesn't know anything about controllers. What sort of controller is it? -- richard > > I expect that the c8t0d0 WD Green is the lemon here and for some reason is > getting stuck in periods where it can write no faster than about 2MB/s. Does > this sound right? > > > Secondly, what I wonder is why it is that the whole file system seems to hang > up at this time. Surely if the other drive is doing nothing, a web page can > be served by reading from the available drive (c8t1d0) while the slow drive > (c8t0d0) is stuck writing slow. > > I have 4GB RAM in the box, and it's not doing much other than running apache > httpd and netatalk. > > > Thanks for any input, > Matt > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss