writes to ZFS objects have significant data and meta-data implications, based 
on the zfs copy-on write implementation ... as data is written into a file 
object, for example, this update must eventually be written to a new location 
on physical disk, and all of the meta-data (from the uberblock down to this 
object) must be updated and re-written to a new location as well ... while in 
cache, the changes to these objects can be consolidated, but once written out 
to disk, any further changes would make this recent write obsolete and require 
it  all to be written once again to yet another new location on the disk ... 
batching transactions for 5 seconds (the trigger discussed in zfs 
documentation) ... is essential to limiting the amount of redundant re-writing 
that takes place to physical disk ... keeping a disk busy 100% of the time by 
writing mostly the same data over and over makes far less sense than collecting 
a group of changes in cache and writing them efficiently every trigger period 
of time ... even with this  optimization, our experience with small, sequential 
writes (4KB or less)  to zvols that have been previously written (to ensure the 
mapping of real space on the physical disk) for example, show bandwidth values 
that are less than 10%  of comparable larger (128KB or larger) writes ... you 
can see this behavior dramatically if you compare the amount of host initiated 
write data (front-end data) to the actual amount of IO  performed to the 
physical disks (both reads and writes) to handle the host's  front-end request 
... for example, doing sequential 1MB writes to a  (previously written) zvol 
(simple catenation of 5 FC drives in a JBOD) and writing 2GB of data induced 
more than 4GB of IO to the drives (with smaller write sizes this ratio gets 
progressively worse)
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to