Hi, I have a low-power server with three drives in it, like so:

matt@vault:~$ zpool status
  pool: rpool
 state: ONLINE
 scan: resilvered 588M in 0h3m with 0 errors on Fri Jan  7 07:38:06 2011
config:

        NAME          STATE     READ WRITE CKSUM
        rpool         ONLINE       0     0     0
          mirror-0    ONLINE       0     0     0
            c8t1d0s0  ONLINE       0     0     0
            c8t0d0s0  ONLINE       0     0     0
        cache
          c12d0s0     ONLINE       0     0     0

errors: No known data errors


I'm running netatalk file sharing for mac, and using it as a time machine 
backup server for my mac laptop.

When files are copying to the server, I often see periods of a minute or so 
where network traffic stops. I'm convinced that there's some bottleneck in the 
storage side of things because when this happens, I can still ping the machine 
and if I have an ssh window, open, I can still see output from a `top` command 
running smoothly. However, if I try and do anything that touches disk (eg `ls`) 
that command stalls. At the time it comes good, everything comes good, file 
copies across the network continue, etc.

If I have a ssh terminal session open and run `iostat -nv 5` I see something 
like this:


                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    1.2   36.0  153.6 4608.0  1.2  0.3   31.9    9.3  16  18 c12d0
    0.0  113.4    0.0 7446.7  0.8  0.1    7.0    0.5  15   5 c8t0d0
    0.2  106.4    4.1 7427.8  4.0  0.1   37.8    1.4  93  14 c8t1d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.4   73.2   25.7 9243.0  2.3  0.7   31.6    9.8  34  37 c12d0
    0.0  226.6    0.0 24860.5  1.6  0.2    7.0    0.9  25  19 c8t0d0
    0.2  127.6    3.4 12377.6  3.8  0.3   29.7    2.2  91  27 c8t1d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0   44.2    0.0 5657.6  1.4  0.4   31.7    9.0  19  20 c12d0
    0.2   76.0    4.8 9420.8  1.1  0.1   14.2    1.7  12  13 c8t0d0
    0.0   16.6    0.0 2058.4  9.0  1.0  542.1   60.2 100 100 c8t1d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0    0.2    0.0   25.6  0.0  0.0    0.3    2.3   0   0 c12d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c8t0d0
    0.0   11.0    0.0 1365.6  9.0  1.0  818.1   90.9 100 100 c8t1d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.2    0.0    0.1    0.0  0.0  0.0    0.1   25.4   0   1 c12d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c8t0d0
    0.0   17.6    0.0 2182.4  9.0  1.0  511.3   56.8 100 100 c8t1d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c12d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c8t0d0
    0.0   16.6    0.0 2058.4  9.0  1.0  542.1   60.2 100 100 c8t1d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c12d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c8t0d0
    0.0   15.8    0.0 1959.2  9.0  1.0  569.6   63.3 100 100 c8t1d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.2    0.0    0.1    0.0  0.0  0.0    0.1    0.1   0   0 c12d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c8t0d0
    0.0   17.4    0.0 2157.6  9.0  1.0  517.2   57.4 100 100 c8t1d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c12d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c8t0d0
    0.0   18.2    0.0 2256.8  9.0  1.0  494.5   54.9 100 100 c8t1d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c12d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c8t0d0
    0.0   14.8    0.0 1835.2  9.0  1.0  608.1   67.5 100 100 c8t1d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.2    0.0    0.1    0.0  0.0  0.0    0.1    0.1   0   0 c12d0
    0.0    1.4    0.0    0.6  0.0  0.0    0.0    0.2   0   0 c8t0d0
    0.0   49.0    0.0 6049.6  6.7  0.5  137.6   11.2 100  55 c8t1d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0   55.4    0.0 7091.2  1.9  0.6   34.9    9.9  27  28 c12d0
    0.2  126.0    8.6 9347.7  1.4  0.1   11.4    0.6  20   7 c8t0d0
    0.0  120.8    0.0 9340.4  4.9  0.2   40.5    1.5  77  18 c8t1d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    1.2   57.0  153.6 7271.2  1.8  0.5   31.0    9.4  26  28 c12d0
    0.2  108.4   12.8 6498.9  0.3  0.1    2.5    0.6   6   5 c8t0d0
    0.2  104.8    5.2 6506.8  4.0  0.2   38.2    1.4  67  15 c8t1d0



The stall occurs when the drive c8t1d0 is 100% waiting, and doing only slow 
i/o, typically writing about 2MB/s. However, the other drive is all zeros... 
doing nothing.

The drives are:
c8t0d0 - Western Digital Green - SATA_____WDC_WD15EARS-00Z_____WD-WMAVU2582242
c8t1d0 - Samsung Silencer - SATA_____SAMSUNG_HD154UI_______S1XWJDWZ309550


I've installed smartmon and done a short and long test on both drives, all 
resulting in no found errors.


I expect that the c8t0d0 WD Green is the lemon here and for some reason is 
getting stuck in periods where it can write no faster than about 2MB/s. Does 
this sound right?


Secondly, what I wonder is why it is that the whole file system seems to hang 
up at this time. Surely if the other drive is doing nothing, a web page can be 
served by reading from the available drive (c8t1d0) while the slow drive 
(c8t0d0) is stuck writing slow.

I have 4GB RAM in the box, and it's not doing much other than running apache 
httpd and netatalk.


Thanks for any input,
Matt
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to