Re: [zfs-discuss] A resilver record?

Giovanni Tirloni Wed, 23 Mar 2011 15:02:58 -0700

On Mon, Mar 21, 2011 at 3:45 PM, Roy Sigurd Karlsbakk <r...@karlsbakk.net> 
wrote:
>
> > Our main backups storage server has 3x 8-drive raidz2 vdevs. Was
> > replacing the 500 GB drives in one vdev with 1 TB drives. The last 2
> > drives took just under 300 hours each. :( The first couple drives
> > took approx 150 hours each, and then it just started taking longer and
> > longer for each drive.
>
> That's strange indeed. I just replaced 21 drives (seven 2TB drives in three 
> raidz2 VDEVs) drives with 3TB ones, and resilver times were quite stable, 
> until the last replace, which was a bit faster. Have you checked 'iostat 
> -en'? If one (or more) of the drives are having i/o errors, that may slow 
> down the whole pool.


We've production servers with 9 vdev's (mirrored) doing `zfs send`
daily to backup servers with with 7 vdev's (each 3-disk raidz1). Some
backup servers that receive datasets with lots of small files
(email/web) keep getting worse resilver times.

# zpool status
  pool: backup
 state: DEGRADED
status: One or more devices has been removed by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
 scrub: resilver in progress for 646h13m, 100.00% done, 0h0m to go
config:

        NAME          STATE     READ WRITE CKSUM
        backup        DEGRADED     0     0     0
          raidz1-0    ONLINE       0     0     0
            c4t2d0    ONLINE       0     0     0
            c4t3d0    ONLINE       0     0     0
            c4t4d0    ONLINE       0     0     0
          raidz1-1    ONLINE       0     0     0
            c4t5d0    ONLINE       0     0     0
            c4t6d0    ONLINE       0     0     0
            c4t7d0    ONLINE       0     0     0
          raidz1-2    DEGRADED     0     0     0
            c4t8d0    ONLINE       0     0     0
            spare-1   DEGRADED     0     0  216M
              c4t9d0  REMOVED      0     0     0
              c4t1d0  ONLINE       0     0     0  874G resilvered
            c4t10d0   ONLINE       0     0     0
          raidz1-3    ONLINE       0     0     0
            c4t11d0   ONLINE       0     0     0
            c4t12d0   ONLINE       0     0     0
            c4t13d0   ONLINE       0     0     0
          raidz1-4    ONLINE       0     0     0
            c4t14d0   ONLINE       0     0     0
            c4t15d0   ONLINE       0     0     0
            c4t16d0   ONLINE       0     0     0
          raidz1-5    ONLINE       0     0     0
            c4t17d0   ONLINE       0     0     0
            c4t18d0   ONLINE       0     0     0
            c4t19d0   ONLINE       0     0     0
          raidz1-6    ONLINE       0     0     0
            c4t20d0   ONLINE       0     0     0
            c4t21d0   ONLINE       0     0     0
            c4t22d0   ONLINE       0     0     0
        spares
          c4t1d0      INUSE     currently in use

# zpool list backup
NAME     SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
backup  19.0T  18.7T   315G    98%  DEGRADED  -

Even though the pool is at 98% utilization, it's usually not a problem
if the production server is sending datasets which hold VM machines.
Here we seem to be clearly maxing out on IOPS of the disks in the
raidz1-2 vdev. It seems logical to go back to mirrors for this kind of
workload (lots of small files, nothing sequential).

What I cannot explain is why c4t1d0 is doings lots of reads, besides
the expected reads. It seems to be holding back the resilver while I
would expect only c4t9d0 and c4t10d0 should be reading. I do not
understand the ZFS internals that are making this happen. Can anyone
explain that? The server is doing nothing but the resilver (not even
receiving new zfs send's).

By the way, since this is OpenSolaris 2009.6, there is a nasty bug
that if I enable fmd, it'll record billions of checksums errors until
the disk is full (so I've had to disable it while resilvering is
happening).

# iostat -Xn 1 | egrep '(c4t(8|10|1)d0|r/s)'
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   35.2   14.9  907.9  135.8  0.0  0.4    0.1    8.6   1  12 c4t1d0
   44.7    4.0  997.6   78.3  0.0  0.3    0.1    5.8   1  10 c4t8d0
   44.8    4.0  997.6   78.3  0.0  0.3    0.1    5.8   1  10 c4t10d0
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   98.6   46.9 2628.2   52.7  0.0  1.3    0.2    8.6   2  39 c4t1d0
  146.5    0.0 2739.2    0.0  0.0  0.8    0.1    5.1   2  25 c4t8d0
  144.5    0.0 2805.9    0.0  0.0  0.7    0.1    5.1   2  26 c4t10d0
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  108.6   45.7 2809.1   50.7  0.0  1.1    0.1    6.9   2  35 c4t1d0
  146.2    0.0 2624.2    0.0  0.0  0.3    0.1    2.3   1  18 c4t8d0
  149.2    0.0 2737.0    0.0  0.0  0.3    0.1    2.3   1  16 c4t10d0
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  113.0   23.0 3226.9   28.0  0.0  1.2    0.1    8.9   2  40 c4t1d0
  159.0    0.0 3286.9    0.0  0.0  0.6    0.1    3.9   2  24 c4t8d0
  176.0    0.0 3545.9    0.0  0.0  0.5    0.1    3.0   2  26 c4t10d0
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  147.4   34.4 3888.9   52.1  0.0  1.5    0.2    8.3   3  43 c4t1d0
  181.7    0.0 3515.1    0.0  0.0  0.6    0.1    3.1   2  24 c4t8d0
  193.5    0.0 3489.9    0.0  0.0  0.6    0.2    3.3   4  22 c4t10d0
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  151.2   33.9 3792.7   42.7  0.0  1.5    0.1    7.9   1  36 c4t1d0
  197.5    0.0 3856.9    0.0  0.0  0.4    0.1    2.3   2  19 c4t8d0
  164.6    0.0 3928.1    0.0  0.0  0.7    0.1    4.2   1  24 c4t10d0
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  171.0   90.0 4426.3  121.5  0.0  1.3    0.1    4.9   3  51 c4t1d0
  184.0    0.0 4426.8    0.0  0.0  0.7    0.1    4.0   2  30 c4t8d0
  195.0    0.0 4430.3    0.0  0.0  0.7    0.1    3.7   2  32 c4t10d0
^C

Anyone else with over 600 hours of resilver time? :-)

Thank you,


Giovanni Tirloni (gtirl...@sysdroid.com)
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] A resilver record?

Reply via email to