On Sun, 27 Dec 2009, Tim Cook wrote:
That is ONLY true when there's significant free space available/a
fresh pool. Once those files have been deleted and the blocks put
back into the free pool, they're no longer "sequential" on disk,
they're all over the disk. So it makes a VERY big difference. I'm
not sure why you'd be shocked someone would bring this up. --
While I don't know what zfs actually does, I do know that it performs
large disk allocations (e.g. 1MB) and then parcels 128K zfs blocks
from those allocations. If the zfs designers are wise, then they will
use knowledge of sequential access to ensure that all of the 128K
blocks from a metaslab allocation are pre-assigned for use by that
file, and they will try to choose metaslabs which are followed by free
metaslabs, or close to other free metaslabs. This approach would tend
to limit the sequential-access damage caused by COW and free block
fragmentation on a "dirty" disk.
This sort of planning is not terribly different than detecting
sequential read I/O and scheduling data reads in advance of
application requirements. If you can intelligently pre-fetch data
blocks, then you can certainly intelligently pre-allocate data blocks.
Today I did an interesting (to me) test where I ran two copies of
iozone at once on huge (up to 64GB) files. The results were somewhat
amazing to me. The cause of the amazement was that I noticed that the
reported data rates from iozone did not drop very much (e.g. a
single-process write rate of 359MB/second dropped to 298MB/second with
two processes). This clearly showed that zfs is doing quite a lot of
smart things when writing files and that it is optimized for
several/many writers rather than just one.
Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss