Yes, that makes sense.  For the first run, the pool has only just been mounted, 
so the ARC will be empty, with plenty of space for prefetching.

On the second run however, the ARC is already full of the data that we just 
read, and I'm guessing that the prefetch code is less aggressive when there is 
already data in the ARC.  Which for normal use may be what you want - it's 
trying to keep things in the ARC in case they are needed.

However that does mean that ZFS prefetch is always going to suffer performance 
degradation on a live system, although early signs are that this might not be 
so severe in snv_117.

I wonder if there is any tuning that can be done to counteract this?  Is there 
any way to tell ZFS to bias towards prefetching rather than preserving data in 
the ARC?  That may provide better performance for scripts like this, or for 
random access workloads.

Also, could there be any generic algorithm improvements that could help.  Why 
should ZFS keep data in the ARC if it hasn't been used?  This script has 8GB 
files, but the ARC should be using at least 1GB of RAM.  That's a minimum of 
128 files in memory, none of which would have been read more than once.  If 
we're reading a new file now, prefetching should be able to displace any old 
object in the ARC that hasn't been used - in this case all 127 previous files 
should be candidates for replacement.

I wonder how that would interact with a L2ARC.  If that was fast enough I'd 
certainly want to allocate more of the ARC to prefetching.

Finally, would it make sense for the ARC to always allow a certain percentage 
for prefetching, possibly with that percentage being tunable, allowing us to 
balance the needs of the two systems according to the expected usage?

Ross
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to