comments below... On May 12, 2012, at 8:10 AM, Jim Klimov wrote:
> 2012-05-12 7:01, Jim Klimov wrote: >> Overall the applied question is whether the disk will >> make it back into the live pool (ultimately with no >> continuous resilvering), and how fast that can be done - >> I don't want to risk the big pool with nonredundant >> arrays for too long. > > Here lies another "grumpy gripe", although maybe pertaining > to the oldish snv_117 on that box: the system is not making > its best possible effort to complete the resilver ASAP :) > > According to "iostat 60", disk utilizations of this raidz > set vary 15-50%busy, queue lengths vary within 5 outstanding > tasks, the CPU kernel time is 2-7% with over 90% idling, > over 2GB RAM remains free... Why won't it go to complete > the quest faster? Can some tire be kicked? ;) > > Sat May 12 19:06:09 MSK 2012 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 309.6 3.8 14863.0 5.0 0.0 4.7 0.0 15.0 0 65 c0t1d0 > 312.5 3.9 14879.7 5.1 0.0 4.6 0.0 14.7 0 64 c4t3d0 > 308.5 4.0 14855.0 5.2 0.0 4.7 0.0 15.1 0 66 c6t5d0 > 310.7 3.9 14855.7 5.1 0.0 4.6 0.0 14.8 0 65 c7t6d0 > 0.0 225.3 0.0 14484.2 0.0 8.1 0.0 36.0 0 83 c5t6d0 > Sat May 12 19:07:09 MSK 2012 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 228.0 3.0 6859.7 4.0 0.0 6.9 0.0 29.9 0 81 c0t1d0 > 227.7 3.3 6850.0 4.3 0.0 6.9 0.0 30.0 0 81 c4t3d0 > 228.1 3.4 6857.9 4.4 0.0 7.0 0.0 30.0 0 81 c6t5d0 > 227.6 3.1 6860.4 4.1 0.0 7.1 0.0 30.7 0 82 c7t6d0 > 0.0 225.8 0.0 6379.1 0.0 8.1 0.0 35.8 0 85 c5t6d0 In general asvc_t of this magnitude along with actv of this size means you might be better off lowering zfs_vdev_max_pending. > ... > > On some minutes the disks sit there doing almost nothing at all: > > Sat May 12 19:01:09 MSK 2012 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 10.7 0.8 665.4 0.7 0.0 0.1 0.0 11.4 0 13 c0t1d0 > 10.7 0.9 667.5 0.7 0.0 0.1 0.0 11.6 0 13 c4t3d0 > 10.7 0.8 666.4 0.7 0.0 0.1 0.0 11.9 0 13 c6t5d0 > 10.7 0.9 668.5 0.7 0.0 0.1 0.0 11.6 0 13 c7t6d0 > 0.1 15.5 0.6 20.3 0.0 0.0 0.0 0.2 0 0 c5t6d0 This behaviour cannot be debugged with iostat or any of the various CPU-monitoring stat utilities. There is blocking somewhere, and it is likely to be in the data path. You might try iosnoop and look for I/O completion times that are large (> 1 second). -- richard > last pid: 18121; load avg: 0.16, 0.15, 0.12; up 0+16:03:44 19:06:51 > 96 processes: 95 sleeping, 1 on cpu > CPU states: 96.6% idle, 0.2% user, 3.2% kernel, 0.0% iowait, 0.0% swap > Memory: 16G phys mem, 2476M free mem, 16G total swap, 16G free swap > ... > >> It has already taken 2 days to try and resilver a 250Gb >> disk into the pool, but never made it past 100Gb progress. :( >> Reports no errors that I'd see either... :) > > Well, that part seems to have been explained in my other > mails, and hopefully worked-around by the hotspare. > > //Jim > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- ZFS Performance and Training richard.ell...@richardelling.com +1-760-896-4422
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss