On Tue, May 30, 2006 at 02:26:07PM -0500, Anton Rang wrote:
> On May 30, 2006, at 12:23 PM, Nicolas Williams wrote:
> 
> >Another way is to have lots of pre-allocated next ubberblock  
> >locations,
> >so that seek-to-one-ubberblock times are always small.  Each  
> >ubberblock
> >can point to its predecessor and its copies and list the pre-allocated
> >possible locations of its successors.
> 
> That's a possibility, though it could be difficult to distinguish an
> uberblock from a datablock after a crash (in the worst case), since now
> you're writing both into the same arena.

I don't agree.

ZFS already has to deal with this (root blocks have to be
self-checksumming).

Basically the new ubberblocks would reference their predecessors and
would include: a checksum of the predecessor and a self-checksum.  The
likelihood that some non-ubberblock data in a block that was once
pre-allocated as a possible ubberblock could look like a valid
ubberblock can be kept vanishingly small.  OTOH, this would present an
attack vector, so such blocks should not be freed for normal filesystem
use until ubber-ubberblocks have been updated and this attack vector has
been closed.

>                                           You'd also need to skip past
> some disk areas (to get to the next uberblock) at each transaction,
> which will cost some small amount of bandwidth.

Yes and no.  You're doing transactions, which already means you're
punctuating writes.  And if you pre-allocate enough potential next-
ubberblocks you can make this cost very small, even when you have
back-to-back transactions in the pipeline.

> Honestly, I'm not sure that focusing on latency-sensitive streaming
> applications is worth it until we can get the bandwidth issues of ZFS
> nailed down.  There's some work yet to reach the 95% of device speed
> mark.  How close does ZFS get to writing at 8 GB/sec on an F15K?

Sure.

> It's also worth noting that the customers for whom streaming is a real
> issue tend to be those who are willing to spend a lot of money for
> reliability (think replicating the whole system+storage) rather than
> compromising performance; for them, simply the checksumming overhead
> and lack of direct I/O in (today's) ZFS may be unacceptable.

Which is it?  They want reliability, or they don't?

It's not always a reliability vs. performance trade-off.  RAID-Z is a
performance improvement (for writes) over RAID-5 precisely because of
the COW/transactional + the ZFS block-checksums-in-pointers approach to
integrity protection.

The choice to use ZFS or not will depend on specific requirements that
one has identified.  If you require a system where multiple cluster
nodes can write to the same filesystems concurrently then ZFS won't do
for you, for example.  OTOH, if you want protection against bit rot then
ZFS is for you.  Etcetera, etcetera.

>                                                               Is it
> worth the effort to change ZFS to satisfy the requirements of that
> relative handful of customers?  I'd rather see us focus on adding
> functionality that we can use to sell Solaris to large numbers of
> customers, and thus building our customer base.  We have a solution
> for streaming already, while we're just entering the reliability and
> ease-of-administration space, where the real opportunity lies.

I'm pretty sure that the ZFS team has the right set of priorities and
will adjust as necessary.  I just don't buy your arguments about design :-)

Nico
-- 
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to