On 07/28/09 17:13, Rich Morris wrote:
On Mon, Jul 20, 2009 at 7:52 PM, Bob Friesenhahn wrote:
Sun has opened internal CR 6859997. It is now in Dispatched state at
High priority.
CR 6859997 has recently been fixed in Nevada. This fix will also be in
Solaris 10 Update 9.
This fix speeds up the sequential prefetch pattern described in this CR
without slowing down other prefetch patterns. Some kstats have also
been added to help improve the observability of ZFS file prefetching.
-- Rich
CR 6859997 has been accepted and is actively being worked on. The
following info has been added to that CR:
This is a problem with the ZFS file prefetch code (zfetch) in
dmu_zfetch.c. The test script provided by the submitter (thanks Bob!)
does no file prefetching the second time through each file. This
problem exists in ZFS in Solaris 10, Nevada, and OpenSolaris.
This test script creates 3000 files each 8M long so the amount of data
(24G) is greater than the amount of memory (16G on a Thumper). With
the default blocksize of 128k, each of the 3000 files has 63 blocks.
The first time through, zfetch ramps up a single prefetch stream
normally. But the second time through, dmu_zfetch() calls
dmu_zfetch_find() which thinks that the data has already been
prefetched so no additional prefetching is started.
This problem is not seen with 500 files each 48M in length (still 24G
of data). In that case there's still only one prefetch stream but it
is reclaimed when one of the requested offsets is not found. The
reason it is not found is that stream "strided" the first time through
after reaching the zfetch cap, which is 256 blocks. Files with no
more than 256 blocks don't require a stride. So this problem will
only be seen when the data from a file with no more than 256 blocks is
accessed after being tossed from the ARC.
The fix for this problem may be more feedback between the ARC and the
zfetch code. Or it may make sense to restart the prefetch stream
after some time has passed or perhaps whenever there's a miss on a
block that was expected to have already been prefetched?
On a Thumper running Nevada build 118, the first pass of this test
takes 2 minutes 50 seconds and the second pass takes 5 minutes 22
seconds. If dmu_zfetch_find() is modified to restart the refetch
stream when the requested offset is 0 and more than 2 seconds has
passed since the stream was last accessed then the time needed for the
second pass is reduced to 2 minutes 24 seconds.
Additional investigation is currently taking place to determine if
another solution makes more sense. And more testing will be needed to
see what affect this change has on other prefetch patterns.
6412053 is a related CR which mentions that the zfetch code may not be
issuing I/O at a sufficient pace. This behavior is also seen on a
Thumper running the test script in CR 6859997 since, even when
prefetch is ramping up as expected, less than half of the available
I/O bandwidth is being used. Although more aggressive file
prefetching could increase memory pressure as described in CRs 6258102
and 6469558.
-- Rich
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss