Re: [zfs-discuss] Speeding up resilver on x4500

Erik Trimble Mon, 22 Jun 2009 00:48:00 -0700

Nicholas Lee wrote:

On Mon, Jun 22, 2009 at 4:24 PM, Stuart Anderson<ander...@ligo.caltech.edu <mailto:ander...@ligo.caltech.edu>> wrote:
    However, it is a bit disconcerting to have to run with reduced data
    protection for an entire week. While I am certainly not going back to
    UFS, it seems like it should be at least theoretically possible to
    do this
    several orders of magnitude faster, e.g., what if every block on the
    replacement disk had its RAIDZ2 data recomputed from the degraded
Maybe this is also saying - that for large disk sets a single RAIDZ2provides a false sense of security.
Nicholas------------------------------------------------------------------------
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

I'm assuming the problem is that you are IOPS bound. Since you wrotesmall files, ZFS uses small stripe sizes. Which means, that when youneed to do a full-stripe read to reconstruct the RAIDZ2 parity, you'rereading only a very small amount of data. You're IOPS bound on thereplacement disk.


For arguments' sake, let's assume you have 4k stripe sizes. Thus, you do:

(1) 4k read across all disks
(2) checksum computation
(3) tiny write to re-silver disk

Assuming you might max out at 300 IOPS (not unreasonable for small readson SATA drives), the results in:

(300 / 2 ) x 4kB = 600k/s.That is, you can do 150 stripe reads and writes, each read/write pairreconstructing the parity for 4k of data. And, that might be optimal.

At that rate, 1TB of data will take ( (1024 * 1024 * 1024 * 1024kB) /600kB/s) = 1.8 million seconds =~ 500 hours.

I don't know about how ZFS does the actual reconstruction, but I havetwo suggestions:

(1) if ZFS is doing a serial resilver (i.e. resilver stripe 1 beforedoing stripe 2, etc), would it be possible to NOT do a full stripe writewhen doing the reconstruction? that is, only write the reconstructeddata back to the replacement disk? That would allow the "data" disks touse their full IOPS reading, and the replacement disks it's full IOPSwriting. It's still going to suck rocks, but only half as much.

(2) Multiple stripe-reconstruction would probably be better; that is,ZFS should reconstruct several adjacent stripes together, up to somereasonable total size (say 1MB or so). That way, you could getreconstruction rates of 100MB/s (that is, reconstruct the parity for100MB of data, NOT writing 100MB/s). 1TB of data @ 100MB/s is only 3hours.


--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Speeding up resilver on x4500

Reply via email to