writes to ZFS objects have significant data and meta-data implications, based
on the zfs copy-on write implementation ... as data is written into a file
object, for example, this update must eventually be written to a new location
on physical disk, and all of the meta-data (from the uberblock down to this
object) must be updated and re-written to a new location as well ... while in
cache, the changes to these objects can be consolidated, but once written out
to disk, any further changes would make this recent write obsolete and require
it all to be written once again to yet another new location on the disk ...
batching transactions for 5 seconds (the trigger discussed in zfs
documentation) ... is essential to limiting the amount of redundant re-writing
that takes place to physical disk ... keeping a disk busy 100% of the time by
writing mostly the same data over and over makes far less sense than collecting
a group of changes in cache and writing them efficiently every trigger period
of time ... even with this optimization, our experience with small, sequential
writes (4KB or less) to zvols that have been previously written (to ensure the
mapping of real space on the physical disk) for example, show bandwidth values
that are less than 10% of comparable larger (128KB or larger) writes ... you
can see this behavior dramatically if you compare the amount of host initiated
write data (front-end data) to the actual amount of IO performed to the
physical disks (both reads and writes) to handle the host's front-end request
... for example, doing sequential 1MB writes to a (previously written) zvol
(simple catenation of 5 FC drives in a JBOD) and writing 2GB of data induced
more than 4GB of IO to the drives (with smaller write sizes this ratio gets
progressively worse)
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss