On Mar 2, 2010, at 2:41 PM, Jeffrey Johnson wrote:
> Hi Folks,
> 
> We have put together a 25T ZFS raidz2 zpool (16x2TB 5900 RPM 32MB
> Cache SATA 3.0Gb/s drives with 2x LSI SAS3081E-R SAS RAID Controllers
> presenting the drives as JBOD straight thru to the backplane) with 2
> hot-spares on OpenSolaris snv_133. The pool contains roughly 800
> Million files which are all very small (~10-200k map tiles). We had a
> hiccup with one of the drives and the resilvering process was
> initiated ... the problem is that zpool status is estimating something
> like 650 hours currently. This estimate has varied from 400 to 1800 as
> it has run over the last couple of days, but it seems to have settled
> around 650 now. That is just WAY too long ... we fear that if the end
> user of this device ever has to replace a drive in the pool, it will
> take this long to rebuild again.
> 
> So, we are wondering if a) there is some way we can optimize or tune
> the pool to deal with this number of small files better and speed up
> the resilvering process or b) some way we can tweak the resilvering
> code to handle for this type of situation better.
> 
> One of our engineers is looking at setting up a VM on another machine
> and using dtrace to find out where the bottleneck is, but we thought
> we might have more luck on this list.

Those are slow drives, so it will take a while to resilver.  To verify the
I/O bottleneck, use iostat and observe the svc_t. If it is more than 5ms
or so, then just be patient.

AFAIK, there is no current rebuild characterization effort or data. I have/had
data from several years ago, but it is not useful today.
 -- richard

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
http://nexenta-atlanta.eventbrite.com (March 16-18, 2010)




_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to