On Sat, Aug 8, 2009 at 3:25 PM, Ed Spencer<ed_spen...@umanitoba.ca> wrote: > > On Sat, 2009-08-08 at 15:12, Mike Gerdts wrote: > >> The DBA's that I know use files that are at least hundreds of >> megabytes in size. Your problem is very different. > Yes, definitely. > > I'm relating records in a table to my small files because our email > system treats the filesystem as a database.
Right... but ZFS doesn't understand your application. The reason that a file system would put files that are in the same directory in the same general area on a disk is to minimize seek time. I would argue that seek time doesn't matter a whole lot here - at least from the vantage point of ZFS. The LUNs that you have presented from the filer are probably RAID6 across many disks. ZFS seems to be doing a 4 way stripe (or are you mirroring or raidz?). Assuming you are doing something like a 7+2 RAID6 on the back end, the contents would be spread across 36 drives.[1] The trick to making this perform well is to have 36 * N worker threads. Mail is a great thing to keep those spindles kinda busy while getting decent performance. A small number of sequential readers - particularly with small files where you can't do a reasonable job with read-ahead - has little chance of keeping that number of drives busy. 1. Or you might have 4 LUNs presented from one 4+1 RAID5 in which you may be forcing more head movement because ZFS thinks it can speed things up by striping data across the LUNs. ZFS can recognize a database (or other application) doing a sequential read on a large file. While data located sequentially on disk can be helpful for reads, this is much less important when the pool sits across tens of disks. This is because it has the ability to spread the iops across lots of disks, potentially reading a heavily fragmented file much faster than a purely sequential file. In either case, your backup application is competing for iops (and seeks) with other workload. With the NetApp backend there are likely other applications on the same aggregate that are forcing head movement away from any data belonging to these LUNs. > And in the back of my mind I'm also thinking that you have to > rebuild/repair the database once in a while to improve performance. Certainly. Databases become fragmented and are reorganized to fix this. > And in my case, since the filesystem is the database, I want to do that > to zfs! > > At least thats what I'm thinking, however, and I always come back to > this, I'm not certian what is causing my problem. I need certainty > before taking action on the production system. Most databases are written in such a way that they can be optimized for sequential reads (table scans) and for backups, whether on raw disk or on a file system. The more advanced the database is, the more likely it is to ask the file system to get out of its way and *not* do anything fancy. It seems that cyrus was optimized for operations that make sense for a mail program (deliver messages, retrieve messages, delete messages) and nothing else. I would argue that any application that creates lots of tiny files is not optimized for backing up using a small number of streams. -- Mike Gerdts http://mgerdts.blogspot.com/ _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss