On 09/02/2010, at 00.23, Daniel Carosone wrote: > On Mon, Feb 08, 2010 at 11:28:11PM +0100, Lasse Osterild wrote: >> Ok thanks I know that the amount of used space will vary, but what's >> the usefulness of the total size when ie in my pool above 4 x 1G >> (roughly, depending on recordsize) are reserved for parity, it's not >> like it's useable for anything else :) I just don't see the point >> when it's a raidz or raidz2 pool, but I guess I am missing something >> here. > > The basis of raidz is that each block is its own raid stripe, with its > own layout. At present, this only matters for the size of the stripe. > For example, if I write a single 512-byte block, to a dual-parity > raidz2, I will write three blocks, to three disks. With a larger > block, I will have more data over more disks, until the block is big > enough to stripe evenly over all of them. As the block gets bigger > yet, more is written to each disk as part of the stripe, and the > parity units get bigger to match the size of the largest data unit. > This "rounding" can very often mean that different disks have > different amounts of data for each stripe. > > Crucially, it also means the ratio of parity-to-data is not fixed. > This tends to average out on a pool with lots of data and mixed > block sizes, but not always; consider an extreme case of a pool > containing only datasets with blocksize=512. That's what the comments > in the documentation are referring to, and the major reason for the > zpool output you see. > > In future, it may go further and be more important. > > Just as the data count per stripe can vary, there's nothing > fundamental in the raidz layout that says that the same parity count > and method has to be used for the entire pool, either. Raidz already > degrades to simple mirroring in some of the same small-stripe cases > discussed above. > > There's no particular reason, in theory, why they could not also have > different amounts of parity on a per-block basis. I imagine that when > bp-rewrite and the ability to reshape pools comes along, this will > indeed be the case, at least during transition. As a simple example, > when reshaping a raidz1 to a raidz2 by adding a disk, there will be > blocks with single parity and other blocks with dual for a time until > the operation is finished. > > Maybe one day in the future, there will just be a basic "raidz" vdev > type, and we can set dataset properties for the number of additional > parity blocks each should get. This might be a little like we can > currently set "copies", including that it would only affect new writes > and lead to very mixed redundancy states. > > Noone has actually said this is a real goal, and the reasons it's not > presently allowed include administrative and operational simplicity as > well as implementation and testing constraints, but I think it would > be handy and cool. > > -- > Dan.
Thanks Dan! :) That explanation made perfect sense and I appreciate you taking the time to write this, perhaps parts of it could go into the FAQ ? I realise that it's sort of in there already but it doesn't explain it very well. Cheers, - Lasse _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss