That is interesting. Could this account for disproportionate kernel
CPU usage for applications that perform I/O one byte at a time, as
compared to other filesystems? (Nevermind that the application
shouldn't do that to begin with.)

No, this is entirely a matter of CPU efficiency in the current code.
There are two issues; we know what they are; and we're fixing them.

The first is that as we translate from znode to dnode, we throw away
information along the way -- we go from znode to object number (fast),
but then we have to do an object lookup to get from object number to
dnode (slow, by comparison -- or more to the point, slow relative to
the cost of writing a single byte).  But this is just stupid, since
we already have a dnode pointer sitting right there in the znode.
We just need to fix our internal interfaces to expose it.

The second problem is that we're not very fast at partial-block
updates.  Again, this is entirely a matter of code efficiency,
not anything fundamental.

I still would love to see something like fbarrier() defined by some
standrd (de facto or otherwise) to make the distinction between
ordered writes and guaranteed persistence more easily exploited in the
general case for applications, and encourage filesystems/storage
systems to optimize for that case (i.e., not have fbarrier() simply
fsync()).

Totally agree.

Jeff
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to