On 8/10/07, Moore, Joe <[EMAIL PROTECTED]> wrote:

> Wishlist: It would be nice to put the whole redundancy definitions into
> the zfs filesystem layer (rather than the pool layer):  Imagine being
> able to "set copies=5+2" for a filesystem... (requires a 7-VDEV pool,
> and stripes via RAIDz2, otherwise the zfs create/set fails)

Yes please ;)

This is practically the holy grail of "dynamic raid" - the ability to
dynamically use different redundancy settings on a per-directory
level, and to use a mix of different sized devices and add/remove them
at will.

I guess one would call this feature (ditto block setting of
stripe+parity). It's doable but probably requires large(ish) changes
to on-disk structures as block pointer will look different.

James, did you look at this? With vdev removal (which I suppose will
be implemented with some kind of "rewrite block" -type code) in place,
"reshape" and rebalance functionality would propably be relatively
small improvements.

BTW here's more wishlist items now that we're at it:

- copies=max+2 (use as many stripes as possible, with border case of
3-way mirror)
- minchunk=8kb (dont spread smaller stripes than this - performance
optimization)
- checksum on every disk independently (instead of full stripe) -
fixes raidz random read performance

.. And one crazy idea just popped into my head: fs-level raid could be
implemented with separate parity blocks instead of the ditto
mechanism. Say, when data first is written,  normal ditto block is
used. Then later, asynchronously, the block is combined with some
other blocks (that may be unrelated), the parity is written to a new
allocation and the ditto block(s) are freed. When data blocks are
freed (by COW) the parity needs to be recalculated before the data
block can actually be forgotten. This can be thought of as combining a
number of ditto blocks into a parity block.

That may be easier or more complicated to implement than saving the
block as stripe+parity in the first place. Depends on the data
structures, which I don't yet know intimately.

Come to think of this, it's probably best to get all these ideas out
there _before_ I start looking into the code - knowing the details has
the tendency to kill all the crazy ideas :)
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to