Re: [zfs-discuss] Resilver restarting several times

Richard Elling Sun, 13 May 2012 07:36:56 -0700

comments below...

On May 12, 2012, at 8:10 AM, Jim Klimov wrote:


> 2012-05-12 7:01, Jim Klimov wrote:
>> Overall the applied question is whether the disk will
>> make it back into the live pool (ultimately with no
>> continuous resilvering), and how fast that can be done -
>> I don't want to risk the big pool with nonredundant
>> arrays for too long.
> 
> Here lies another "grumpy gripe", although maybe pertaining
> to the oldish snv_117 on that box: the system is not making
> its best possible effort to complete the resilver ASAP :)
> 
> According to "iostat 60", disk utilizations of this raidz
> set vary 15-50%busy, queue lengths vary within 5 outstanding
> tasks, the CPU kernel time is 2-7% with over 90% idling,
> over 2GB RAM remains free... Why won't it go to complete
> the quest faster? Can some tire be kicked? ;)
> 
> Sat May 12 19:06:09 MSK 2012
>                    extended device statistics
>    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>  309.6    3.8 14863.0    5.0  0.0  4.7    0.0   15.0   0  65 c0t1d0
>  312.5    3.9 14879.7    5.1  0.0  4.6    0.0   14.7   0  64 c4t3d0
>  308.5    4.0 14855.0    5.2  0.0  4.7    0.0   15.1   0  66 c6t5d0
>  310.7    3.9 14855.7    5.1  0.0  4.6    0.0   14.8   0  65 c7t6d0
>    0.0  225.3    0.0 14484.2  0.0  8.1    0.0   36.0   0  83 c5t6d0
> Sat May 12 19:07:09 MSK 2012
>                    extended device statistics
>    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>  228.0    3.0 6859.7    4.0  0.0  6.9    0.0   29.9   0  81 c0t1d0
>  227.7    3.3 6850.0    4.3  0.0  6.9    0.0   30.0   0  81 c4t3d0
>  228.1    3.4 6857.9    4.4  0.0  7.0    0.0   30.0   0  81 c6t5d0
>  227.6    3.1 6860.4    4.1  0.0  7.1    0.0   30.7   0  82 c7t6d0
>    0.0  225.8    0.0 6379.1  0.0  8.1    0.0   35.8   0  85 c5t6d0

In general asvc_t of this magnitude along with actv of this size 
means you might be better off lowering zfs_vdev_max_pending.

> ...
> 
> On some minutes the disks sit there doing almost nothing at all:
> 
> Sat May 12 19:01:09 MSK 2012
>                    extended device statistics
>    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>   10.7    0.8  665.4    0.7  0.0  0.1    0.0   11.4   0  13 c0t1d0
>   10.7    0.9  667.5    0.7  0.0  0.1    0.0   11.6   0  13 c4t3d0
>   10.7    0.8  666.4    0.7  0.0  0.1    0.0   11.9   0  13 c6t5d0
>   10.7    0.9  668.5    0.7  0.0  0.1    0.0   11.6   0  13 c7t6d0
>    0.1   15.5    0.6   20.3  0.0  0.0    0.0    0.2   0   0 c5t6d0

This behaviour cannot be debugged with iostat or any of the various
CPU-monitoring stat utilities. There is blocking somewhere, and it is
likely to be in the data path. You might try iosnoop and look for I/O
completion times that are large (> 1 second).
 -- richard

> last pid: 18121;  load avg:  0.16,  0.15,  0.12; up 0+16:03:44 19:06:51
> 96 processes: 95 sleeping, 1 on cpu
> CPU states: 96.6% idle,  0.2% user, 3.2% kernel, 0.0% iowait, 0.0% swap
> Memory: 16G phys mem, 2476M free mem, 16G total swap, 16G free swap
> ...
> 
>> It has already taken 2 days to try and resilver a 250Gb
>> disk into the pool, but never made it past 100Gb progress. :(
>> Reports no errors that I'd see either... :)
> 
> Well, that part seems to have been explained in my other
> mails, and hopefully worked-around by the hotspare.
> 
> //Jim
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

--
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Resilver restarting several times

Reply via email to