On Mar 2, 2010, at 2:41 PM, Jeffrey Johnson wrote: > Hi Folks, > > We have put together a 25T ZFS raidz2 zpool (16x2TB 5900 RPM 32MB > Cache SATA 3.0Gb/s drives with 2x LSI SAS3081E-R SAS RAID Controllers > presenting the drives as JBOD straight thru to the backplane) with 2 > hot-spares on OpenSolaris snv_133. The pool contains roughly 800 > Million files which are all very small (~10-200k map tiles). We had a > hiccup with one of the drives and the resilvering process was > initiated ... the problem is that zpool status is estimating something > like 650 hours currently. This estimate has varied from 400 to 1800 as > it has run over the last couple of days, but it seems to have settled > around 650 now. That is just WAY too long ... we fear that if the end > user of this device ever has to replace a drive in the pool, it will > take this long to rebuild again. > > So, we are wondering if a) there is some way we can optimize or tune > the pool to deal with this number of small files better and speed up > the resilvering process or b) some way we can tweak the resilvering > code to handle for this type of situation better. > > One of our engineers is looking at setting up a VM on another machine > and using dtrace to find out where the bottleneck is, but we thought > we might have more luck on this list.
Those are slow drives, so it will take a while to resilver. To verify the I/O bottleneck, use iostat and observe the svc_t. If it is more than 5ms or so, then just be patient. AFAIK, there is no current rebuild characterization effort or data. I have/had data from several years ago, but it is not useful today. -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance http://nexenta-atlanta.eventbrite.com (March 16-18, 2010) _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss