On Sun, 27 Dec 2009, Tim Cook wrote:

That is ONLY true when there's significant free space available/a fresh pool.  Once those files have been deleted and the blocks put back into the free pool, they're no longer "sequential" on disk, they're all over the disk.  So it makes a VERY big difference.  I'm not sure why you'd be shocked someone would bring this up.   --

While I don't know what zfs actually does, I do know that it performs large disk allocations (e.g. 1MB) and then parcels 128K zfs blocks from those allocations. If the zfs designers are wise, then they will use knowledge of sequential access to ensure that all of the 128K blocks from a metaslab allocation are pre-assigned for use by that file, and they will try to choose metaslabs which are followed by free metaslabs, or close to other free metaslabs. This approach would tend to limit the sequential-access damage caused by COW and free block fragmentation on a "dirty" disk.

This sort of planning is not terribly different than detecting sequential read I/O and scheduling data reads in advance of application requirements. If you can intelligently pre-fetch data blocks, then you can certainly intelligently pre-allocate data blocks.

Today I did an interesting (to me) test where I ran two copies of iozone at once on huge (up to 64GB) files. The results were somewhat amazing to me. The cause of the amazement was that I noticed that the reported data rates from iozone did not drop very much (e.g. a single-process write rate of 359MB/second dropped to 298MB/second with two processes). This clearly showed that zfs is doing quite a lot of smart things when writing files and that it is optimized for several/many writers rather than just one.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to