Tim Cook writes:

 > On Sun, Dec 27, 2009 at 6:43 PM, Bob Friesenhahn <
 > bfrie...@simple.dallas.tx.us> wrote:
 > 
 > > On Sun, 27 Dec 2009, Tim Cook wrote:
 > >
 > >>
 > >> That is ONLY true when there's significant free space available/a fresh
 > >> pool.  Once those files have been deleted and the blocks put back into the
 > >> free pool, they're no longer "sequential" on disk, they're all over the
 > >> disk.  So it makes a VERY big difference.  I'm not sure why you'd be 
 > >> shocked
 > >> someone would bring this up.   --
 > >>
 > >
 > > While I don't know what zfs actually does, I do know that it performs large
 > > disk allocations (e.g. 1MB) and then parcels 128K zfs blocks from those
 > > allocations.  If the zfs designers are wise, then they will use knowledge 
 > > of
 > > sequential access to ensure that all of the 128K blocks from a metaslab
 > > allocation are pre-assigned for use by that file, and they will try to
 > > choose metaslabs which are followed by free metaslabs, or close to other
 > > free metaslabs.  This approach would tend to limit the sequential-access
 > > damage caused by COW and free block fragmentation on a "dirty" disk.
 > >
 > >
 > How is that going to prevent blocks being spread all over the disk when
 > you've got files several GB in size being written concurrently and deleted
 > at random?  And then throw in a mix of small files as well, kiss that
 > goodbye.
 > 
 > 

Big files being deleted creates big chunks of space for
reuse. That is a great way to clean up the layout.
Within a metaslab ZFS uses cursors to bunch small objects
closer  together. 

        
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/metaslab.c#501

 > 
 > > This sort of planning is not terribly different than detecting sequential
 > > read I/O and scheduling data reads in advance of application requirements.
 > >  If you can intelligently pre-fetch data blocks, then you can certainly
 > > intelligently pre-allocate data blocks.
 > >
 > >
 > Pre-allocating data blocks is also not going to cure head seek and the
 > latency it induces on slow 7200/5400RPM drives.
 > 
 > 
 > 
 > 
 > > Today I did an interesting (to me) test where I ran two copies of iozone at
 > > once on huge (up to 64GB) files.  The results were somewhat amazing to me.
 > >  The cause of the amazement was that I noticed that the reported data rates
 > > from iozone did not drop very much (e.g. a single-process write rate of
 > > 359MB/second dropped to 298MB/second with two processes).  This clearly
 > > showed that zfs is doing quite a lot of smart things when writing files and
 > > that it is optimized for several/many writers rather than just one.
 > >
 > >
 > On a new, empty pool, or a pool that's been filled completely and emptied
 > several times?  It's not amazing to me on a new pool.  I would be surprised
 > to see you accomplish this feat repeatedly after filling and emptying the
 > drives.  It's a drawback of every implementation of copy-on-write I've ever
 > seen.  By it's very nature, I have no idea how you would avoid it.
 > 

If you empty the drives you're back to all free space :

        http://blogs.sun.com/bonwick/entry/space_maps

If you leave yourselve a nive cushion of free space and if
you're profile of object sizes does no radically changes
over time, I think people should be fine when it comes to
free space fragmentation issues. That said, slab and block
selection is still on our radard for improvements.

-r


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to