Hi, I have a low-power server with three drives in it, like so:
matt@vault:~$ zpool status pool: rpool state: ONLINE scan: resilvered 588M in 0h3m with 0 errors on Fri Jan 7 07:38:06 2011 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c8t1d0s0 ONLINE 0 0 0 c8t0d0s0 ONLINE 0 0 0 cache c12d0s0 ONLINE 0 0 0 errors: No known data errors I'm running netatalk file sharing for mac, and using it as a time machine backup server for my mac laptop. When files are copying to the server, I often see periods of a minute or so where network traffic stops. I'm convinced that there's some bottleneck in the storage side of things because when this happens, I can still ping the machine and if I have an ssh window, open, I can still see output from a `top` command running smoothly. However, if I try and do anything that touches disk (eg `ls`) that command stalls. At the time it comes good, everything comes good, file copies across the network continue, etc. If I have a ssh terminal session open and run `iostat -nv 5` I see something like this: extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1.2 36.0 153.6 4608.0 1.2 0.3 31.9 9.3 16 18 c12d0 0.0 113.4 0.0 7446.7 0.8 0.1 7.0 0.5 15 5 c8t0d0 0.2 106.4 4.1 7427.8 4.0 0.1 37.8 1.4 93 14 c8t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.4 73.2 25.7 9243.0 2.3 0.7 31.6 9.8 34 37 c12d0 0.0 226.6 0.0 24860.5 1.6 0.2 7.0 0.9 25 19 c8t0d0 0.2 127.6 3.4 12377.6 3.8 0.3 29.7 2.2 91 27 c8t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 44.2 0.0 5657.6 1.4 0.4 31.7 9.0 19 20 c12d0 0.2 76.0 4.8 9420.8 1.1 0.1 14.2 1.7 12 13 c8t0d0 0.0 16.6 0.0 2058.4 9.0 1.0 542.1 60.2 100 100 c8t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 0.2 0.0 25.6 0.0 0.0 0.3 2.3 0 0 c12d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t0d0 0.0 11.0 0.0 1365.6 9.0 1.0 818.1 90.9 100 100 c8t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.2 0.0 0.1 0.0 0.0 0.0 0.1 25.4 0 1 c12d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t0d0 0.0 17.6 0.0 2182.4 9.0 1.0 511.3 56.8 100 100 c8t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c12d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t0d0 0.0 16.6 0.0 2058.4 9.0 1.0 542.1 60.2 100 100 c8t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c12d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t0d0 0.0 15.8 0.0 1959.2 9.0 1.0 569.6 63.3 100 100 c8t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.2 0.0 0.1 0.0 0.0 0.0 0.1 0.1 0 0 c12d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t0d0 0.0 17.4 0.0 2157.6 9.0 1.0 517.2 57.4 100 100 c8t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c12d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t0d0 0.0 18.2 0.0 2256.8 9.0 1.0 494.5 54.9 100 100 c8t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c12d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t0d0 0.0 14.8 0.0 1835.2 9.0 1.0 608.1 67.5 100 100 c8t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.2 0.0 0.1 0.0 0.0 0.0 0.1 0.1 0 0 c12d0 0.0 1.4 0.0 0.6 0.0 0.0 0.0 0.2 0 0 c8t0d0 0.0 49.0 0.0 6049.6 6.7 0.5 137.6 11.2 100 55 c8t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 55.4 0.0 7091.2 1.9 0.6 34.9 9.9 27 28 c12d0 0.2 126.0 8.6 9347.7 1.4 0.1 11.4 0.6 20 7 c8t0d0 0.0 120.8 0.0 9340.4 4.9 0.2 40.5 1.5 77 18 c8t1d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1.2 57.0 153.6 7271.2 1.8 0.5 31.0 9.4 26 28 c12d0 0.2 108.4 12.8 6498.9 0.3 0.1 2.5 0.6 6 5 c8t0d0 0.2 104.8 5.2 6506.8 4.0 0.2 38.2 1.4 67 15 c8t1d0 The stall occurs when the drive c8t1d0 is 100% waiting, and doing only slow i/o, typically writing about 2MB/s. However, the other drive is all zeros... doing nothing. The drives are: c8t0d0 - Western Digital Green - SATA_____WDC_WD15EARS-00Z_____WD-WMAVU2582242 c8t1d0 - Samsung Silencer - SATA_____SAMSUNG_HD154UI_______S1XWJDWZ309550 I've installed smartmon and done a short and long test on both drives, all resulting in no found errors. I expect that the c8t0d0 WD Green is the lemon here and for some reason is getting stuck in periods where it can write no faster than about 2MB/s. Does this sound right? Secondly, what I wonder is why it is that the whole file system seems to hang up at this time. Surely if the other drive is doing nothing, a web page can be served by reading from the available drive (c8t1d0) while the slow drive (c8t0d0) is stuck writing slow. I have 4GB RAM in the box, and it's not doing much other than running apache httpd and netatalk. Thanks for any input, Matt -- This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss