That is interesting. Could this account for disproportionate kernel CPU usage for applications that perform I/O one byte at a time, as compared to other filesystems? (Nevermind that the application shouldn't do that to begin with.)
No, this is entirely a matter of CPU efficiency in the current code. There are two issues; we know what they are; and we're fixing them. The first is that as we translate from znode to dnode, we throw away information along the way -- we go from znode to object number (fast), but then we have to do an object lookup to get from object number to dnode (slow, by comparison -- or more to the point, slow relative to the cost of writing a single byte). But this is just stupid, since we already have a dnode pointer sitting right there in the znode. We just need to fix our internal interfaces to expose it. The second problem is that we're not very fast at partial-block updates. Again, this is entirely a matter of code efficiency, not anything fundamental.
I still would love to see something like fbarrier() defined by some standrd (de facto or otherwise) to make the distinction between ordered writes and guaranteed persistence more easily exploited in the general case for applications, and encourage filesystems/storage systems to optimize for that case (i.e., not have fbarrier() simply fsync()).
Totally agree. Jeff _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss