On Sat, Aug 8, 2009 at 3:25 PM, Ed Spencer<ed_spen...@umanitoba.ca> wrote:
>
> On Sat, 2009-08-08 at 15:12, Mike Gerdts wrote:
>
>> The DBA's that I know use files that are at least hundreds of
>> megabytes in size.  Your problem is very different.
> Yes, definitely.
>
> I'm relating records in a table to my small files because our email
> system treats the filesystem as a database.

Right... but ZFS doesn't understand your application.  The reason that
a file system would put files that are in the same directory in the
same general area on a disk is to minimize seek time.  I would argue
that seek time doesn't matter a whole lot here - at least from the
vantage point of ZFS.  The LUNs that you have presented from the filer
are probably RAID6 across many disks.  ZFS seems to be doing a  4 way
stripe (or are you mirroring or raidz?).  Assuming you are doing
something like a 7+2 RAID6 on the back end, the contents would be
spread across 36 drives.[1]  The trick to making this perform well is
to have 36 * N worker threads.  Mail is a great thing to keep those
spindles kinda busy while getting decent performance.  A small number
of sequential readers - particularly with small files where you can't
do a reasonable job with read-ahead - has little chance of keeping
that number of drives busy.

1. Or you might have 4 LUNs presented from one 4+1 RAID5 in which you
may be forcing more head movement because ZFS thinks it can speed
things up by striping data across the LUNs.

ZFS can recognize a database (or other application) doing a sequential
read on a large file.  While data located sequentially on disk can be
helpful for reads, this is much less important when the pool sits
across tens of disks.  This is because it has the ability to spread
the iops across lots of disks, potentially reading a heavily
fragmented file much faster than a purely sequential file.

In either case, your backup application is competing for iops (and
seeks) with other workload.  With the NetApp backend there are likely
other applications on the same aggregate that are forcing head
movement away from any data belonging to these LUNs.

> And in the back of my mind I'm also thinking that you have to
> rebuild/repair the database once in a while to improve performance.

Certainly.  Databases become fragmented and are reorganized to fix this.

> And in my case, since the filesystem is the database, I want to do that
> to zfs!
>
> At least thats what I'm thinking, however, and I always come back to
> this, I'm not certian what is causing my problem. I need certainty
> before taking action on the production system.

Most databases are written in such a way that they can be optimized
for sequential reads (table scans) and for backups, whether on raw
disk or on a file system.  The more advanced the database is, the more
likely it is to ask the file system to get out of its way and *not* do
anything fancy.

It seems that cyrus was optimized for operations that make sense for a
mail program (deliver messages, retrieve messages, delete messages)
and nothing else.  I would argue that any application that creates
lots of tiny files is not optimized for backing up using a small
number of streams.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to