On Sat, Mar 6, 2010 at 12:50 PM, Erik Trimble <erik.trim...@sun.com> wrote:
> This is true. SSDs and HDs differ little in their ability to handle raw > throughput. However, we often still see problems in ZFS associated with > periodic system "pauses" where ZFS effectively monopolizes the HDs to write > out it's current buffered I/O. People have been complaining about this for > quite awhile. SSDs have a huge advantage where IOPS are concerned, and > given that the backing store HDs have to service both read and write > requests, they're severely limited on the number of IOPs they can give to > incoming data. > > You have a good point, but I'd still be curious to see what an async cache > would do. After all, that is effectively what the HBA cache is, and we see > a significant improvement with it, and not just for sync write. > > I might see what your mean here. Because ZFS has to aggregate some write data during a short period (txn alive time) to avoid generating too many random write HDD requests, the bandwidth of HDD during this time is wasted. For write heavy streaming workload, especially those who can saturate the HDD pool bandwidth easily, ZFS will make the performance worse than those legacy file system, i.e. UFS or EXT3. The IOPS of the HDD is not the limitation here. The bandwidth of the HDD is the root cause. This is the design choice of ZFS. Reducing the length of period during txn commit can alleviate the problem. So that the size of data needing to flush to the disk every time could be smaller. Replace the HDD with some high-end FC disks may solve this problem. > I also don't know what the threshold is in ZFS for it to consider it time > to do a async buffer flush. Is it time based? % of RAM based? Absolute > amount? All of that would impact whether an SSD async cache would be useful. > > IMHO, ZFS flush the data back to disk asynchronously every 5 seconds, which is the default configuration of txn commit period. ZFS will also flush the data back to disk even before the 5 second period, based on the estimation of amount of memory has been used for the current txn. This is called as write throttle. See below link: http://blogs.sun.com/roch/entry/the_new_zfs_write_throttle > -- > Erik Trimble > Java System Support > Mailstop: usca22-123 > Phone: x17195 > Santa Clara, CA > >
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss