Observation below...

On Feb 4, 2011, at 7:10 PM, Matt Connolly wrote:

> Hi, I have a low-power server with three drives in it, like so:
> 
> 
> matt@vault:~$ zpool status
>  pool: rpool
> state: ONLINE
> scan: resilvered 588M in 0h3m with 0 errors on Fri Jan  7 07:38:06 2011
> config:
> 
>        NAME          STATE     READ WRITE CKSUM
>        rpool         ONLINE       0     0     0
>          mirror-0    ONLINE       0     0     0
>            c8t1d0s0  ONLINE       0     0     0
>            c8t0d0s0  ONLINE       0     0     0
>        cache
>          c12d0s0     ONLINE       0     0     0
> 
> errors: No known data errors
> 
> 
> I'm running netatalk file sharing for mac, and using it as a time machine 
> backup server for my mac laptop.
> 
> When files are copying to the server, I often see periods of a minute or so 
> where network traffic stops. I'm convinced that there's some bottleneck in 
> the storage side of things because when this happens, I can still ping the 
> machine and if I have an ssh window, open, I can still see output from a 
> `top` command running smoothly. However, if I try and do anything that 
> touches disk (eg `ls`) that command stalls. At the time it comes good, 
> everything comes good, file copies across the network continue, etc.
> 
> If I have a ssh terminal session open and run `iostat -nv 5` I see something 
> like this:
> 
> 
>                    extended device statistics              
>    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>    1.2   36.0  153.6 4608.0  1.2  0.3   31.9    9.3  16  18 c12d0
>    0.0  113.4    0.0 7446.7  0.8  0.1    7.0    0.5  15   5 c8t0d0
>    0.2  106.4    4.1 7427.8  4.0  0.1   37.8    1.4  93  14 c8t1d0
>                    extended device statistics              
>    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>    0.4   73.2   25.7 9243.0  2.3  0.7   31.6    9.8  34  37 c12d0
>    0.0  226.6    0.0 24860.5  1.6  0.2    7.0    0.9  25  19 c8t0d0
>    0.2  127.6    3.4 12377.6  3.8  0.3   29.7    2.2  91  27 c8t1d0
>                    extended device statistics              
>    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>    0.0   44.2    0.0 5657.6  1.4  0.4   31.7    9.0  19  20 c12d0
>    0.2   76.0    4.8 9420.8  1.1  0.1   14.2    1.7  12  13 c8t0d0
>    0.0   16.6    0.0 2058.4  9.0  1.0  542.1   60.2 100 100 c8t1d0
>                    extended device statistics              
>    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>    0.0    0.2    0.0   25.6  0.0  0.0    0.3    2.3   0   0 c12d0
>    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c8t0d0
>    0.0   11.0    0.0 1365.6  9.0  1.0  818.1   90.9 100 100 c8t1d0
>                    extended device statistics              
>    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>    0.2    0.0    0.1    0.0  0.0  0.0    0.1   25.4   0   1 c12d0
>    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c8t0d0
>    0.0   17.6    0.0 2182.4  9.0  1.0  511.3   56.8 100 100 c8t1d0
>                    extended device statistics              
>    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c12d0
>    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c8t0d0
>    0.0   16.6    0.0 2058.4  9.0  1.0  542.1   60.2 100 100 c8t1d0
>                    extended device statistics              
>    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c12d0
>    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c8t0d0
>    0.0   15.8    0.0 1959.2  9.0  1.0  569.6   63.3 100 100 c8t1d0
>                    extended device statistics              
>    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>    0.2    0.0    0.1    0.0  0.0  0.0    0.1    0.1   0   0 c12d0
>    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c8t0d0
>    0.0   17.4    0.0 2157.6  9.0  1.0  517.2   57.4 100 100 c8t1d0
>                    extended device statistics              
>    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c12d0
>    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c8t0d0
>    0.0   18.2    0.0 2256.8  9.0  1.0  494.5   54.9 100 100 c8t1d0
>                    extended device statistics              
>    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c12d0
>    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c8t0d0
>    0.0   14.8    0.0 1835.2  9.0  1.0  608.1   67.5 100 100 c8t1d0
>                    extended device statistics              
>    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>    0.2    0.0    0.1    0.0  0.0  0.0    0.1    0.1   0   0 c12d0
>    0.0    1.4    0.0    0.6  0.0  0.0    0.0    0.2   0   0 c8t0d0
>    0.0   49.0    0.0 6049.6  6.7  0.5  137.6   11.2 100  55 c8t1d0
>                    extended device statistics              
>    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>    0.0   55.4    0.0 7091.2  1.9  0.6   34.9    9.9  27  28 c12d0
>    0.2  126.0    8.6 9347.7  1.4  0.1   11.4    0.6  20   7 c8t0d0
>    0.0  120.8    0.0 9340.4  4.9  0.2   40.5    1.5  77  18 c8t1d0
>                    extended device statistics              
>    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>    1.2   57.0  153.6 7271.2  1.8  0.5   31.0    9.4  26  28 c12d0
>    0.2  108.4   12.8 6498.9  0.3  0.1    2.5    0.6   6   5 c8t0d0
>    0.2  104.8    5.2 6506.8  4.0  0.2   38.2    1.4  67  15 c8t1d0

The queues are building in the HBA (wait, wsvc_t, %w) not at the disk (actv,
asvc_t, %b).  Changing the disk might not help.  Changing the controller
might help immensely.

> The stall occurs when the drive c8t1d0 is 100% waiting, and doing only slow 
> i/o, typically writing about 2MB/s. However, the other drive is all zeros... 
> doing nothing.
> 
> The drives are:
> c8t0d0 - Western Digital Green - SATA_____WDC_WD15EARS-00Z_____WD-WMAVU2582242
> c8t1d0 - Samsung Silencer - SATA_____SAMSUNG_HD154UI_______S1XWJDWZ309550
> 
> 
> I've installed smartmon and done a short and long test on both drives, all 
> resulting in no found errors.
> 

smartmon doesn't know anything about controllers.  What sort of controller is 
it?
 -- richard

> 
> I expect that the c8t0d0 WD Green is the lemon here and for some reason is 
> getting stuck in periods where it can write no faster than about 2MB/s. Does 
> this sound right?
> 
> 
> Secondly, what I wonder is why it is that the whole file system seems to hang 
> up at this time. Surely if the other drive is doing nothing, a web page can 
> be served by reading from the available drive (c8t1d0) while the slow drive 
> (c8t0d0) is stuck writing slow.
> 
> I have 4GB RAM in the box, and it's not doing much other than running apache 
> httpd and netatalk.
> 
> 
> Thanks for any input,
> Matt
> -- 
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to