So while I'm feeling optimistic :-) we really ought to be able to do this in two I/O operations. If we have, say, 500K of data to write (including all of the metadata), we should be able to allocate a contiguous 500K block on disk and write that with a single operation. Then we update the Uberblock.
The only inherent problem preventing this right now is that we don't have general scatter/gather at the driver level (ugh). This is a bug that should be fixed, IMO. Then ZFS just needs to delay choosing physical block locations until theyâre being written as part of a group. (Of course, as NetApp points out in their WAFL papers, the goal of optimizing writes can conflict with the goal of optimizing reads, so taken to an extreme, this optimization isnât always desirable.) Hi Anton, Optimistic a little yes. The data block should have aggregated quite well into near recordsize I/Os, are you sure they did not ? No O_DSYNC in here right ? Once the data blocks are on disk we have the information necessary to update the indirect blocks iteratively up to the ueberblock. Those are the smaller I/Os; I guess that because of ditto blocks they go to physically seperate locations, by design. All of these though are normally done asynchronously to applications, unless the disks are flooded. But I follow you in that, It may be remotely possible to reduce the number of Iterations in the process by assuming that the I/O will all succeed, then if some fails, fix up the consequence and when all done, update the ueberblock. I would not hold my breath quite yet for that. -r _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss