Re: [zfs-discuss] A resilver record?

Richard Elling Sun, 20 Mar 2011 14:24:39 -0700

On Mar 20, 2011, at 12:48 PM, David Magda wrote:
> On Mar 20, 2011, at 14:24, Roy Sigurd Karlsbakk wrote:
> 
>>>> It all depends on the number of drives in the VDEV(s), traffic
>>>> patterns during resilver, speed VDEV fill, of drives etc. Still,
>>>> close to 6 days is a lot. Can you detail your configuration?
>>> 
>>> How many times do we have to rehash this? The speed of resilver is
>>> dependent on the amount of data, the distribution of data on the
>>> resilvering device, speed of the resilvering device, and the throttle. It 
>>> is NOT
>>> dependent on the number of drives in the vdev.
>> 
>> Thanks for clearing this up - I've been told large VDEVs lead to long 
>> resilver times, but then, I guess that was wrong.
> 
> There was a thread ("Suggested RaidZ configuration...") a little while back 
> where the topic of IOps and resilver time came up:
> 
> http://mail.opensolaris.org/pipermail/zfs-discuss/2010-September/thread.html#44633
> 
> I think this message by Erik Trimble is a good summary:


hmmm... I must've missed that one, otherwise I would have said...

> 
>> Scenario 1:    I have 5 1TB disks in a raidz1, and I assume I have 128k slab 
>> sizes.  Thus, I have 32k of data for each slab written to each disk. (4x32k 
>> data + 32k parity for a 128k slab size).  So, each IOPS gets to reconstruct 
>> 32k of data on the failed drive.   It thus takes about 1TB/32k = 31e6 IOPS 
>> to reconstruct the full 1TB drive.

Here, the IOPS doesn't matter because the limit will be the media write
speed of the resilvering disk -- bandwidth.

>> 
>> Scenario 2:    I have 10 1TB drives in a raidz1, with the same 128k slab 
>> sizes.  In this case, there's only about 14k of data on each drive for a 
>> slab. This means, each IOPS to the failed drive only write 14k.  So, it 
>> takes 1TB/14k = 71e6 IOPS to complete.

Here, IOPS might matter, but I doubt it.  Where we see IOPS matter is when the 
block
sizes are small (eg. metadata). In some cases you can see widely varying 
resilver times when 
the data is large versus small. These changes follow the temporal distribution 
of the original
data. For example, if a pool's life begins with someone loading their MP3 
collection (large
blocks, mostly sequential) and then working on source code (small blocks, more 
random, lots
of creates/unlinks) then the resilver will be bandwidth bound as it resilvers 
the MP3s and then 
IOPS bound as it resilvers the source. Hence, the prediction of when resilver 
will finish is not
very accurate.

>> 
>> From this, it can be pretty easy to see that the number of required IOPS to 
>> the resilvered disk goes up linearly with the number of data drives in a 
>> vdev.  Since you're always going to be IOPS bound by the single disk 
>> resilvering, you have a fixed limit.

You will not always be IOPS bound by the resilvering disk. You will be speed 
bound
by the resilvering disk, where speed is either write bandwidth or random write 
IOPS.
 -- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] A resilver record?

Reply via email to