On May 30, 2006, at 12:23 PM, Nicolas Williams wrote:

Another way is to have lots of pre-allocated next ubberblock locations, so that seek-to-one-ubberblock times are always small. Each ubberblock
can point to its predecessor and its copies and list the pre-allocated
possible locations of its successors.

That's a possibility, though it could be difficult to distinguish an
uberblock from a datablock after a crash (in the worst case), since now
you're writing both into the same arena.  You'd also need to skip past
some disk areas (to get to the next uberblock) at each transaction,
which will cost some small amount of bandwidth.

The on-disk layout of ZFS does not dictate block allocation policies.

Precisely, which is why I broke the issues apart. Two of them, at least, can be attacked through simple code changes. The uberblock update may or
may not be an issue.  It would be interesting to test this, by changing
the implementation in the other areas and seeing whether we can succeed
in matching the streaming performance of other file systems, and where
the bottlenecks are.

It's worth pointing out (maybe?) that having an uberblock (or, for that
matter, an indirect block) stored in the "middle" of your data may be a
problem, if it results in issuing a short read to the disk.  Performance
is better if you read 4 MB from disk and throw out a small piece in the
middle than if you do a 2 MB read followed by a slightly shorter read
to skip the piece you don't want. Again, this does not require an on- disk
layout change.

Honestly, I'm not sure that focusing on latency-sensitive streaming
applications is worth it until we can get the bandwidth issues of ZFS
nailed down.  There's some work yet to reach the 95% of device speed
mark.  How close does ZFS get to writing at 8 GB/sec on an F15K?
It's also worth noting that the customers for whom streaming is a real
issue tend to be those who are willing to spend a lot of money for
reliability (think replicating the whole system+storage) rather than
compromising performance; for them, simply the checksumming overhead
and lack of direct I/O in (today's) ZFS may be unacceptable.  Is it
worth the effort to change ZFS to satisfy the requirements of that
relative handful of customers?  I'd rather see us focus on adding
functionality that we can use to sell Solaris to large numbers of
customers, and thus building our customer base.  We have a solution
for streaming already, while we're just entering the reliability and
ease-of-administration space, where the real opportunity lies.

-- Anton

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to