Re: [zfs-discuss] zfs fragmentation

Ed Spencer Sat, 08 Aug 2009 14:42:52 -0700

On Sat, 2009-08-08 at 16:09, Mike Gerdts wrote:

> Right... but ZFS doesn't understand your application.  The reason that
> a file system would put files that are in the same directory in the
> same general area on a disk is to minimize seek time.  I would argue
> that seek time doesn't matter a whole lot here - at least from the
> vantage point of ZFS.  The LUNs that you have presented from the filer
> are probably RAID6 across many disks.


Yes. Raid4DP. 16 drive arrays. 42 drives in total (one hot spare).

> ZFS seems to be doing a  4 way
> stripe (or are you mirroring or raidz?).  

Here's the pool (no zfs raid):
  pool: space
 state: ONLINE
 scrub: none requested
config:

        NAME                                     STATE     READ WRITE
CKSUM
        space                                    ONLINE       0    
0     0
          c4t60A98000433469764E4A2D456A644A74d0  ONLINE       0    
0     0
          c4t60A98000433469764E4A2D456A696579d0  ONLINE       0    
0     0
          c4t60A98000433469764E4A476D2F6B385Ad0  ONLINE       0    
0     0
          c4t60A98000433469764E4A476D2F664E4Fd0  ONLINE       0    
0     0

errors: No known data errors

> Assuming you are doing
> something like a 7+2 RAID6 on the back end, the contents would be
> spread across 36 drives.[1]  The trick to making this perform well is
> to have 36 * N worker threads.  Mail is a great thing to keep those
> spindles kinda busy while getting decent performance.  A small number
> of sequential readers - particularly with small files where you can't
> do a reasonable job with read-ahead - has little chance of keeping
> that number of drives busy.

The server is also a Sun T2000 (sun4v).

> 1. Or you might have 4 LUNs presented from one 4+1 RAID5 in which you
> may be forcing more head movement because ZFS thinks it can speed
> things up by striping data across the LUNs.
> 
> ZFS can recognize a database (or other application) doing a sequential
> read on a large file.  While data located sequentially on disk can be
> helpful for reads, this is much less important when the pool sits
> across tens of disks.  This is because it has the ability to spread
> the iops across lots of disks, potentially reading a heavily
> fragmented file much faster than a purely sequential file.
> 
> In either case, your backup application is competing for iops (and
> seeks) with other workload.  With the NetApp backend there are likely
> other applications on the same aggregate that are forcing head
> movement away from any data belonging to these LUNs.
Email makes up about 98% of our IP San.
There are only a couple of other apps on it that require block storage.
We run "reallocate" jobs nightly to ensure the luns stay sequential
within the netapp storage pool (aggregate) because of its COW
filesystem.

> > And in the back of my mind I'm also thinking that you have to
> > rebuild/repair the database once in a while to improve performance.
> 
> Certainly.  Databases become fragmented and are reorganized to fix this.
> 
> > And in my case, since the filesystem is the database, I want to do that
> > to zfs!
> >
> > At least thats what I'm thinking, however, and I always come back to
> > this, I'm not certian what is causing my problem. I need certainty
> > before taking action on the production system.
> 
> Most databases are written in such a way that they can be optimized
> for sequential reads (table scans) and for backups, whether on raw
> disk or on a file system.  The more advanced the database is, the more
> likely it is to ask the file system to get out of its way and *not* do
> anything fancy.
> 
> It seems that cyrus was optimized for operations that make sense for a
> mail program (deliver messages, retrieve messages, delete messages)
> and nothing else.  I would argue that any application that creates
> lots of tiny files is not optimized for backing up using a small
> number of streams.

Oh yes. Lots of small files is the backup nightmare.

-- 
Ed 


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs fragmentation

Reply via email to