On Mar 6, 2010, at 1:38 AM, Zhu Han wrote: > On Sat, Mar 6, 2010 at 12:50 PM, Erik Trimble <erik.trim...@sun.com> wrote: > This is true. SSDs and HDs differ little in their ability to handle raw > throughput. However, we often still see problems in ZFS associated with > periodic system "pauses" where ZFS effectively monopolizes the HDs to write > out it's current buffered I/O. People have been complaining about this for > quite awhile. SSDs have a huge advantage where IOPS are concerned, and given > that the backing store HDs have to service both read and write requests, > they're severely limited on the number of IOPs they can give to incoming data. > > You have a good point, but I'd still be curious to see what an async cache > would do. After all, that is effectively what the HBA cache is, and we see a > significant improvement with it, and not just for sync write. > > > I might see what your mean here. Because ZFS has to aggregate some write data > during a short period (txn alive time) to avoid generating too many random > write HDD requests, the bandwidth of HDD during this time is wasted. For > write heavy streaming workload, especially those who can saturate the HDD > pool bandwidth easily, ZFS will make the performance worse than those legacy > file system, i.e. UFS or EXT3. The IOPS of the HDD is not the limitation > here. The bandwidth of the HDD is the root cause.
This statement is too simple, and thus does not represent reality very well. For a fully streaming workload where the load is near the capacity of the storage, the algorithms in ZFS will work to optimize the match. There is still some work to be done, but I don't believe UFS has beat ZFS on Solaris for a significant streaming benchmark for several years now. What we do see is that high performance SSDs can saturate the SAS/SATA link for extended periods of time. For example, a Western Digital SiliconEdge Blue (a new, midrange model) can read at 250 MB/sec in contrast to a WD RE4 which has a media transfer rate of 138 MB/sec. High-speed SSDs are already putting the hurt on 6Gbps SAS/SATA -- the Micron models claim 370 MB/sec sustained. Since this can be easily parallelized, expect that the high-end SSDs will saturate whatever you can connect them to. This is one reason why the F5100 has 64 SAS channels for host connections. > This is the design choice of ZFS. Reducing the length of period during txn > commit can alleviate the problem. So that the size of data needing to flush > to the disk every time could be smaller. Replace the HDD with some high-end > FC disks may solve this problem. Properly matching I/O source and sink is still important, no file system can relieve you of that duty :-) > I also don't know what the threshold is in ZFS for it to consider it time to > do a async buffer flush. Is it time based? % of RAM based? Absolute amount? > All of that would impact whether an SSD async cache would be useful. The answer is "yes" to all of these questions, but there are many variables to consider, so YMMV. -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance http://nexenta-atlanta.eventbrite.com (March 16-18, 2010) _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss