On 09/02/2010, at 00.23, Daniel Carosone wrote:

> On Mon, Feb 08, 2010 at 11:28:11PM +0100, Lasse Osterild wrote:
>> Ok thanks I know that the amount of used space will vary, but what's
>> the usefulness of the total size when ie in my pool above 4 x 1G
>> (roughly, depending on recordsize) are reserved for parity, it's not
>> like it's useable for anything else :)  I just don't see the point
>> when it's a raidz or raidz2 pool, but I guess I am missing something
>> here.  
> 
> The basis of raidz is that each block is its own raid stripe, with its
> own layout.  At present, this only matters for the size of the stripe.
> For example, if I write a single 512-byte block, to a dual-parity
> raidz2, I will write three blocks, to three disks.  With a larger
> block, I will have more data over more disks, until the block is big
> enough to stripe evenly over all of them. As the block gets bigger
> yet, more is written to each disk as part of the stripe, and the
> parity units get bigger to match the size of the largest data unit.
> This "rounding" can very often mean that different disks have
> different amounts of data for each stripe.  
> 
> Crucially, it also means the ratio of parity-to-data is not fixed.
> This tends to average out on a pool with lots of data and mixed 
> block sizes, but not always; consider an extreme case of a pool
> containing only datasets with blocksize=512. That's what the comments
> in the documentation are referring to, and the major reason for the
> zpool output you see.
> 
> In future, it may go further and be more important.
> 
> Just as the data count per stripe can vary, there's nothing
> fundamental in the raidz layout that says that the same parity count
> and method has to be used for the entire pool, either.  Raidz already
> degrades to simple mirroring in some of the same small-stripe cases
> discussed above.
> 
> There's no particular reason, in theory, why they could not also have
> different amounts of parity on a per-block basis.  I imagine that when
> bp-rewrite and the ability to reshape pools comes along, this will
> indeed be the case, at least during transition.  As a simple example,
> when reshaping a raidz1 to a raidz2 by adding a disk, there will be
> blocks with single parity and other blocks with dual for a time until
> the operation is finished. 
> 
> Maybe one day in the future, there will just be a basic "raidz" vdev
> type, and we can set dataset properties for the number of additional
> parity blocks each should get.  This might be a little like we can
> currently set "copies", including that it would only affect new writes
> and lead to very mixed redundancy states.  
> 
> Noone has actually said this is a real goal, and the reasons it's not
> presently allowed include administrative and operational simplicity as
> well as implementation and testing constraints, but I think it would
> be handy and cool.  
> 
> --
> Dan.

Thanks Dan! :)

That explanation made perfect sense and I appreciate you taking the time to 
write this, perhaps parts of it could go into the FAQ ?  I realise that it's 
sort of in there already but it doesn't explain it very well.

Cheers,

 - Lasse
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to