On Sat, Mar 6, 2010 at 12:50 PM, Erik Trimble <erik.trim...@sun.com> wrote:

> This is true.  SSDs and HDs differ little in their ability to handle raw
> throughput. However, we often still see problems in ZFS associated with
> periodic system "pauses" where ZFS effectively monopolizes the HDs to write
> out it's current buffered I/O.  People have been complaining about this for
> quite awhile.  SSDs have a huge advantage where IOPS are concerned, and
> given that the backing store HDs have to service both read and write
> requests, they're severely limited on the number of IOPs they can give to
> incoming data.
>
> You have a good point, but I'd still be curious to see what an async cache
> would do.  After all, that is effectively what the HBA cache is, and we see
> a significant improvement with it, and not just for sync write.
>
>
I might see what your mean here. Because ZFS has to aggregate some write
data during a short period (txn alive time) to avoid generating too many
random write HDD requests, the bandwidth of HDD during this time is wasted.
For write heavy streaming workload, especially those who can saturate the
HDD pool bandwidth easily, ZFS will make the performance worse than those
legacy file system, i.e. UFS or EXT3. The IOPS of the HDD is not the
limitation here. The bandwidth of the HDD is the root cause.

This is the design choice of ZFS. Reducing the length of period during txn
commit can alleviate the problem. So that the size of data needing to flush
to the disk every time could be smaller. Replace the HDD with some high-end
FC disks may solve this problem.


> I also don't know what the threshold is in ZFS for it to consider it time
> to do a async buffer flush.  Is it time based?  % of RAM based? Absolute
> amount? All of that would impact whether an SSD async cache would be useful.
>
>
IMHO, ZFS flush the data back to disk asynchronously every 5 seconds, which
is the default configuration of txn commit period.  ZFS will also flush the
data back to disk even before the 5 second period, based on the estimation
of amount of memory has been used for the current txn.  This is called as
write throttle. See below link:
http://blogs.sun.com/roch/entry/the_new_zfs_write_throttle



> --
> Erik Trimble
> Java System Support
> Mailstop:  usca22-123
> Phone:  x17195
> Santa Clara, CA
>
>
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to