On Mar 20, 2011, at 12:48 PM, David Magda wrote: > On Mar 20, 2011, at 14:24, Roy Sigurd Karlsbakk wrote: > >>>> It all depends on the number of drives in the VDEV(s), traffic >>>> patterns during resilver, speed VDEV fill, of drives etc. Still, >>>> close to 6 days is a lot. Can you detail your configuration? >>> >>> How many times do we have to rehash this? The speed of resilver is >>> dependent on the amount of data, the distribution of data on the >>> resilvering device, speed of the resilvering device, and the throttle. It >>> is NOT >>> dependent on the number of drives in the vdev. >> >> Thanks for clearing this up - I've been told large VDEVs lead to long >> resilver times, but then, I guess that was wrong. > > There was a thread ("Suggested RaidZ configuration...") a little while back > where the topic of IOps and resilver time came up: > > http://mail.opensolaris.org/pipermail/zfs-discuss/2010-September/thread.html#44633 > > I think this message by Erik Trimble is a good summary:
hmmm... I must've missed that one, otherwise I would have said... > >> Scenario 1: I have 5 1TB disks in a raidz1, and I assume I have 128k slab >> sizes. Thus, I have 32k of data for each slab written to each disk. (4x32k >> data + 32k parity for a 128k slab size). So, each IOPS gets to reconstruct >> 32k of data on the failed drive. It thus takes about 1TB/32k = 31e6 IOPS >> to reconstruct the full 1TB drive. Here, the IOPS doesn't matter because the limit will be the media write speed of the resilvering disk -- bandwidth. >> >> Scenario 2: I have 10 1TB drives in a raidz1, with the same 128k slab >> sizes. In this case, there's only about 14k of data on each drive for a >> slab. This means, each IOPS to the failed drive only write 14k. So, it >> takes 1TB/14k = 71e6 IOPS to complete. Here, IOPS might matter, but I doubt it. Where we see IOPS matter is when the block sizes are small (eg. metadata). In some cases you can see widely varying resilver times when the data is large versus small. These changes follow the temporal distribution of the original data. For example, if a pool's life begins with someone loading their MP3 collection (large blocks, mostly sequential) and then working on source code (small blocks, more random, lots of creates/unlinks) then the resilver will be bandwidth bound as it resilvers the MP3s and then IOPS bound as it resilvers the source. Hence, the prediction of when resilver will finish is not very accurate. >> >> From this, it can be pretty easy to see that the number of required IOPS to >> the resilvered disk goes up linearly with the number of data drives in a >> vdev. Since you're always going to be IOPS bound by the single disk >> resilvering, you have a fixed limit. You will not always be IOPS bound by the resilvering disk. You will be speed bound by the resilvering disk, where speed is either write bandwidth or random write IOPS. -- richard _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss