On Sat, Aug 8, 2009 at 12:51 PM, Ed Spencer<ed_spen...@umanitoba.ca> wrote: > > On Sat, 2009-08-08 at 09:17, Bob Friesenhahn wrote: >> Many of us here already tested our own systems and found that under >> some conditions ZFS was offering up only 30MB/second for bulk data >> reads regardless of how exotic our storage pool and hardware was. > > Just so we are using the same units of measurements. Backup/copy > throughput on our development mail server is 8.5MB/sec. The people > running our backups would be over joyed with that performance. > > However backup/copy throughput on our production mail server is 2.25 > MB/sec. > > The underlying disk is 15000 RPM 146GB FC drives. > Our performance may be hampered somewhat because the luns are on a > Network Appliance accessed via iSCSI, but not to the extent that we are > seeing, and it does not account for the throughput difference in the > development and production pools.
NetApp filers run WAFL - Write Anywhere File Layout. Even if ZFS arranged everything perfrectly (however that is defined) WAFL would undo its hard work. Since you are using iSCSI, I assume that you have disabled the Nagle algorithm and increased tcp_xmit_hiwat and tcp_recv_hiwat. If not, go do that now. > When I talk about fragmentation its not in the normal sense. I'm not > talking about blocks in a file not being sequential. I'm talking about > files in a single directory that end up spread across the entire > filesytem/pool. It's tempting to think that if the files were in roughly the same area of the block device that ZFS sees that reading the files sequentially would at least trigger a read-ahead at the filer. I suspect that even a moderate amount of file creation and deletion would cause the I/O pattern to be random enough (not purely sequential) that the back-end storage would not have a reasonable chance of recognizing it as a good time for read-ahead. Further, since the backup application is probably in a loop of: while there are more files in the directory if next file mtime > last backup time open file read file contents, send to backup stream close file end if end while In other words, other I/O operations are interspersed between the sequential data reads, some files are likely to be skipped, and there is latency introduced by writing to the data stream. I would be surprised to see any file system do intelligent read-ahead here. In other words, lots of small file operations make backups and especially restores go slowly. More backup and restore streams will almost certainly help. Multiplex the streams so that you can keep your tapes moving at a constant speed. Do you have statistics on network utilization to ensure that you aren't stressing it? Have you looked at iostat data to be sure that you are seeing asvc_t + wsvc_t that supports the number of operations that you need to perform? That is if asvc_t + wsvc_t for a device adds up to 10 ms, a workload that waits for the completion of one I/O before issuing the next will max out at 100 iops. Presumably ZFS should hide some of this from you[1], but it does suggest that each backup stream would be limited to about 100 files per second[2]. This is because the read request for one file does not happen before the close of the previous file[3]. Since cyrus stores each message as a separate file, this suggests that 2.5 MB/s corresponds to average mail message size of 25 KB. 1. via metadata caching, read-ahead on file data reads, etc. 2. Assuming wsvc_t + asvc_t = 10 ms 3. Assuming that networker is about as smart as tar, zip, cpio, etc. > My problem right now is diagnosing the performance issues. I can't > address them without understanding the underlying cause. There is a > lack of tools to help in this area. There is also a lack of acceptance > that I'm actually having a problem with zfs. Its frustrating. This is a prime example of why Sun needs to sell Analytics[4][5] as an add-on to Solaris in general. This problem is just as hard to figure out on Solaris as it is on Linux, Windows, etc. If Analytics were bundled with Gold and above support contracts, it would be a very compelling reason to shell out a few extra bucks for better support contract. 4. http://blogs.sun.com/bmc/resource/cec_analytics.pdf 5. http://blogs.sun.com/brendan/category/Fishworks > Anyone know how significantly increase the performance of a zfs > filesystem without causing any downtime to an Enterprise email system > used by 30,000 intolerant people, when you don't really know what is > causing the performance issues in the first place? (Yeah, it sucks to be > me!) Hopefully I've helped find a couple places to look... -- Mike Gerdts http://mgerdts.blogspot.com/ _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss