> Nathan Kroenert wrote: ...
What if it did a double update: One to a > staged area, and another > > immediately after that to the 'old' data blocks. > Still always have > > on-disk consistency etc, at a cost of double the > I/O's... > > This is a non-starter. Two I/Os is worse than one. Well, that attitude may be supportable for a write-only workload, but then so is the position that you really don't even need *one* I/O (since no one will ever need to read the data and you might as well just drop it on the floor). In the real world, data (especially database data) does usually get read after being written, and the entire reason the original poster raised the question was because sometimes it's well worth taking on some additional write overhead to reduce read overhead. In such a situation, if you need to protect the database from partial-block updates as well as to keep it reasonably laid out for sequential table access, then performing the two writes described is about as good a solution as one can get (especially if the first of them can be logged - even better, logged in NVRAM - such that its overhead can be amortized across multiple such updates by otherwise independent processes, and even more especially if, as is often the case, the same data gets updated multiple times in sufficiently close succession that instead of 2N writes you wind up only needing to perform N+1 writes, the last being the only one that updates the data in place after the activity has cooled down). > > > Of course, both of these would require non-sparse > file creation for the > > DB etc, but would it be plausible? > > > > For very read intensive and position sensitive > applications, I guess > > this sort of capability might make a difference? > > We are all anxiously awaiting data... Then you might find it instructive to learn more about the evolution of file systems on Unix: In The Beginning there was the block, and the block was small, and it was isolated from its brethren, and darkness was upon the face of the deep because any kind of sequential performance well and truly sucked. Then (after an inexcusably lengthy period of such abject suckage lasting into the '80s) there came into the world FFS, and while there was still only the block the block was at least a bit larger, and it was at least somewhat less isolated from its brethren, and once in a while it actually lived right next to them, and while sequential performance still usually sucked at least it sucked somewhat less. And then the disciples Kleiman and McVoy looked upon FFS and decided that mere proximity was still insufficient, and they arranged that blocks should (at least when convenient) be aggregated into small groups (56 KB actually not being all that small at the time, given the disk characteristics back then), and the Great Sucking Sound of Unix sequential-access performance was finally reduced to something at least somewhat quieter than a dull roar. But other disciples had (finally) taken a look at commercial file systems that had been out in the real world for decades and that had had sequential performance down pretty well pat for nearly that long. And so it came to pass that corporations like Veritas (VxFS), and SGI (EFS & XFS), and IBM (JFS) imported the concept of extents into the Unix pantheon, and the Gods of Throughput looked upon it, and it was good, and (at least in those systems) Unix sequential performance no longer sucked at all, and even non-corporate developers whose faith was strong nearly to the point of being blind could not help but see the virtues revealed there, and began incorporating extents into their own work, yea, even unto ext4. And the disciple Hitz (for it was he, with a few others) took a somewhat different tack, and came up with a 'write anywhere file layout' but had the foresight to recognize that it needed some mechanism to address sequential performance (not to mention parity-RAID performance). So he abandoned general-purpose approaches in favor of the Appliance, and gave it most uncommodity-like but yet virtuous NVRAM to allow many consecutive updates to be aggregated into not only stripes but adjacent stripes before being dumped to disk, and the Gods of Throughput smiled upon his efforts, and they became known throughout the land. Now comes back Sun with ZFS, apparently ignorant of the last decade-plus of Unix file system development (let alone development in other systems dating back to the '60s). Blocks, while larger (though not necessarily proportionally larger, due to dramatic increases in disk bandwidth), are once again often isolated from their brethren. True, this makes the COW approach a lot easier to implement, but (leaving aside the debate about whether COW as implemented in ZFS is a good idea at all) there is *no question whatsoever* that it returns a significant degree of suckage to sequential performance - especially for data subject to small, random updates. Here ends our lesson for today. - bill This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss