On Sat, 2009-08-08 at 15:05, Mike Gerdts wrote: > On Sat, Aug 8, 2009 at 12:51 PM, Ed Spencer<ed_spen...@umanitoba.ca> wrote: > > > > On Sat, 2009-08-08 at 09:17, Bob Friesenhahn wrote: > >> Many of us here already tested our own systems and found that under > >> some conditions ZFS was offering up only 30MB/second for bulk data > >> reads regardless of how exotic our storage pool and hardware was. > > > > Just so we are using the same units of measurements. Backup/copy > > throughput on our development mail server is 8.5MB/sec. The people > > running our backups would be over joyed with that performance. > > > > However backup/copy throughput on our production mail server is 2.25 > > MB/sec. > > > > The underlying disk is 15000 RPM 146GB FC drives. > > Our performance may be hampered somewhat because the luns are on a > > Network Appliance accessed via iSCSI, but not to the extent that we are > > seeing, and it does not account for the throughput difference in the > > development and production pools. > > NetApp filers run WAFL - Write Anywhere File Layout. Even if ZFS > arranged everything perfrectly (however that is defined) WAFL would > undo its hard work. > > Since you are using iSCSI, I assume that you have disabled the Nagle > algorithm and increased tcp_xmit_hiwat and tcp_recv_hiwat. If not, > go do that now. We've tried many different iscsi parameter changes on our development server: Jumbo Frames Disabling the Nagle I'll double check next week on tcp_xmit_hiwat and tcp_recv_hiwat.
Nothing has made any real difference. We are only using about 5% of the bandwidth on our IPSan. We use two cisco ethernet switches on the IPSAN. The iscsi initiators use MPXIO in a round robin configuration. > > When I talk about fragmentation its not in the normal sense. I'm not > > talking about blocks in a file not being sequential. I'm talking about > > files in a single directory that end up spread across the entire > > filesytem/pool. > > It's tempting to think that if the files were in roughly the same area > of the block device that ZFS sees that reading the files sequentially > would at least trigger a read-ahead at the filer. I suspect that even > a moderate amount of file creation and deletion would cause the I/O > pattern to be random enough (not purely sequential) that the back-end > storage would not have a reasonable chance of recognizing it as a good > time for read-ahead. Further, since the backup application is > probably in a loop of: > > while there are more files in the directory > if next file mtime > last backup time > open file > read file contents, send to backup stream > close file > end if > end while > > In other words, other I/O operations are interspersed between the > sequential data reads, some files are likely to be skipped, and there > is latency introduced by writing to the data stream. I would be > surprised to see any file system do intelligent read-ahead here. In > other words, lots of small file operations make backups and especially > restores go slowly. More backup and restore streams will almost > certainly help. Multiplex the streams so that you can keep your tapes > moving at a constant speed. We backup to disk first and then put to tape later. > Do you have statistics on network utilization to ensure that you > aren't stressing it? > > Have you looked at iostat data to be sure that you are seeing asvc_t + > wsvc_t that supports the number of operations that you need to > perform? That is if asvc_t + wsvc_t for a device adds up to 10 ms, a > workload that waits for the completion of one I/O before issuing the > next will max out at 100 iops. Presumably ZFS should hide some of > this from you[1], but it does suggest that each backup stream would be > limited to about 100 files per second[2]. This is because the read > request for one file does not happen before the close of the previous > file[3]. Since cyrus stores each message as a separate file, this > suggests that 2.5 MB/s corresponds to average mail message size of 25 > KB. > > 1. via metadata caching, read-ahead on file data reads, etc. > 2. Assuming wsvc_t + asvc_t = 10 ms > 3. Assuming that networker is about as smart as tar, zip, cpio, etc. There is a backup of a single filesystem in the pool going on right now: # zpool iostat 5 5 capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- space 1.05T 965G 97 69 5.24M 2.71M space 1.05T 965G 113 10 6.41M 996K space 1.05T 965G 100 112 2.87M 1.81M space 1.05T 965G 112 8 2.35M 35.9K space 1.05T 965G 106 3 1.76M 55.1K Here are examples : iostat -xpn 5 5 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 17.1 29.2 746.7 317.1 0.0 0.6 0.0 12.5 0 27 c4t60A98000433469764E4A2D456A644A74d0 25.0 11.9 991.9 277.0 0.0 0.6 0.0 16.1 0 36 c4t60A98000433469764E4A2D456A696579d0 14.9 17.9 423.0 406.4 0.0 0.3 0.0 10.2 0 21 c4t60A98000433469764E4A476D2F664E4Fd0 20.8 17.4 588.9 361.2 0.0 0.4 0.0 11.5 0 30 c4t60A98000433469764E4A476D2F6B385Ad0 and: r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 11.9 43.0 528.9 1972.8 0.0 2.1 0.0 38.9 0 31 c4t60A98000433469764E4A2D456A644A74d0 17.0 19.6 496.9 1499.0 0.0 1.4 0.0 38.8 0 39 c4t60A98000433469764E4A2D456A696579d0 14.0 30.0 670.2 1971.3 0.0 1.7 0.0 38.0 0 34 c4t60A98000433469764E4A476D2F664E4Fd0 19.7 28.7 985.2 1647.6 0.0 1.6 0.0 32.5 0 37 c4t60A98000433469764E4A476D2F6B385Ad0 and: r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 22.7 41.3 973.7 423.5 0.0 0.8 0.0 11.8 0 34 c4t60A98000433469764E4A2D456A644A74d0 27.9 20.0 1474.7 344.0 0.0 0.8 0.0 16.7 0 42 c4t60A98000433469764E4A2D456A696579d0 15.1 17.9 1318.7 463.7 0.0 0.6 0.0 17.7 0 19 c4t60A98000433469764E4A476D2F664E4Fd0 22.3 19.5 1801.7 406.7 0.0 0.8 0.0 20.0 0 29 c4t60A98000433469764E4A476D2F6B385Ad0 > > My problem right now is diagnosing the performance issues. I can't > > address them without understanding the underlying cause. There is a > > lack of tools to help in this area. There is also a lack of acceptance > > that I'm actually having a problem with zfs. Its frustrating. > > This is a prime example of why Sun needs to sell Analytics[4][5] as an > add-on to Solaris in general. This problem is just as hard to figure > out on Solaris as it is on Linux, Windows, etc. If Analytics were > bundled with Gold and above support contracts, it would be a very > compelling reason to shell out a few extra bucks for better support > contract. > > 4. http://blogs.sun.com/bmc/resource/cec_analytics.pdf > 5. http://blogs.sun.com/brendan/category/Fishworks > Oh definitely! It will also give me the oppurtunity to yell at my drives! Might help to relieve some stress. http://sunbeltblog.blogspot.com/2009/01/yelling-at-your-hard-drive.html > > Anyone know how significantly increase the performance of a zfs > > filesystem without causing any downtime to an Enterprise email system > > used by 30,000 intolerant people, when you don't really know what is > > causing the performance issues in the first place? (Yeah, it sucks to be > > me!) > > Hopefully I've helped find a couple places to look... Thanx -- Ed _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss