On Tue, May 30, 2006 at 02:26:07PM -0500, Anton Rang wrote: > On May 30, 2006, at 12:23 PM, Nicolas Williams wrote: > > >Another way is to have lots of pre-allocated next ubberblock > >locations, > >so that seek-to-one-ubberblock times are always small. Each > >ubberblock > >can point to its predecessor and its copies and list the pre-allocated > >possible locations of its successors. > > That's a possibility, though it could be difficult to distinguish an > uberblock from a datablock after a crash (in the worst case), since now > you're writing both into the same arena.
I don't agree. ZFS already has to deal with this (root blocks have to be self-checksumming). Basically the new ubberblocks would reference their predecessors and would include: a checksum of the predecessor and a self-checksum. The likelihood that some non-ubberblock data in a block that was once pre-allocated as a possible ubberblock could look like a valid ubberblock can be kept vanishingly small. OTOH, this would present an attack vector, so such blocks should not be freed for normal filesystem use until ubber-ubberblocks have been updated and this attack vector has been closed. > You'd also need to skip past > some disk areas (to get to the next uberblock) at each transaction, > which will cost some small amount of bandwidth. Yes and no. You're doing transactions, which already means you're punctuating writes. And if you pre-allocate enough potential next- ubberblocks you can make this cost very small, even when you have back-to-back transactions in the pipeline. > Honestly, I'm not sure that focusing on latency-sensitive streaming > applications is worth it until we can get the bandwidth issues of ZFS > nailed down. There's some work yet to reach the 95% of device speed > mark. How close does ZFS get to writing at 8 GB/sec on an F15K? Sure. > It's also worth noting that the customers for whom streaming is a real > issue tend to be those who are willing to spend a lot of money for > reliability (think replicating the whole system+storage) rather than > compromising performance; for them, simply the checksumming overhead > and lack of direct I/O in (today's) ZFS may be unacceptable. Which is it? They want reliability, or they don't? It's not always a reliability vs. performance trade-off. RAID-Z is a performance improvement (for writes) over RAID-5 precisely because of the COW/transactional + the ZFS block-checksums-in-pointers approach to integrity protection. The choice to use ZFS or not will depend on specific requirements that one has identified. If you require a system where multiple cluster nodes can write to the same filesystems concurrently then ZFS won't do for you, for example. OTOH, if you want protection against bit rot then ZFS is for you. Etcetera, etcetera. > Is it > worth the effort to change ZFS to satisfy the requirements of that > relative handful of customers? I'd rather see us focus on adding > functionality that we can use to sell Solaris to large numbers of > customers, and thus building our customer base. We have a solution > for streaming already, while we're just entering the reliability and > ease-of-administration space, where the real opportunity lies. I'm pretty sure that the ZFS team has the right set of priorities and will adjust as necessary. I just don't buy your arguments about design :-) Nico -- _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss