On Thu, Jun 25, 2009 at 1:27 PM, <johan...@sun.com> wrote:
>
> On Thu, Jun 25, 2009 at 11:16:50AM -0700, Sean Liu wrote:
> > Now that ZFS cache will  eat up all the free mem for cache, again
> > that's all fine. But vmstat doesn't report meaningful free memory
> > again. Yes memstat can tell you the ZFS file data size but who wants
> > to run mdb every now and then?
> >
> > Can we introduce some new kernel variable the way we introduced
> > cachelist and make vmstat free memory report meaningful again?
>
> I disagree with the premise that the free memory calculation isn't
> useful because it doesn't count ZFS caches as free space.  They aren't
> free space.  The memory is in use by the system, and unless it's able to
> reduce the size of the cache, that memory isn't going to be availble to
> your application.  The ZFS caches are in kmem, and are maintained
> differently than the freelist and cachelist, which are actually lists of
> unused pages.  Pages in the ARC are mapped by segkmem and allocated
> through the kmem cache; as far as the system is concerned, these pages
> are in use.
>
> Since kmem has to free all of the objects allocated from a slab before
> it can return the space to vmem, heap fragmentation, and subsystems that
> don't have reliable memory release callbacks can make it difficult for
> kmem to reclaim unused space.  If you're on a ONNV kernel, get in mdb
> -k, and run ::kmem_slabs to see how much space is wasted due to
> fragmentation in a particular cache.
>
> The space that's consumed by ZFS is in use and can't simply be peeled
> off a free-list and used immediately.  Adding a variable that counts
> space used by ZFS as free is actually going to be more confusing.

But...

Suppose that I have a machine with 8 GB of memory and 12 GB of stuff
on disk.  There is a trivial web server running that serves up some
static pages now and then.  For argument's sake, the working set of
this web server is 1 GB.  Due to a full backup that happened a week or
a month ago, somewhere between 6 & 8 GB of file system contents are in
the ARC.  Most administrators, I think, would like to have at least
the non-active (not used recently - whatever that means) appear as
free memory.  As it is, anyone that looks at free memory reported by
vmstat or similar will see that there is no available memory to fire
up a J2EE app that will require 2 GB of memory as it pushes bits
between a web front-end and a database that are on different hosts.
In reality, the app could be started and run with no real difference
in system performance (or free memory).

Observability of zfs memory usage is important.  Unless one digs into
kstats or ::memstat, memory used by ZFS for caching is unobservable.
Since most administrators don't know how to use such tools, I consider
ZFS caching to be unobservable to most.  Even with kstats or
::memstat, there is either a lack of available data or missing
documentation to understand whether performance would suffer by adding
applications to a system that has a significant amount of memory used
by the ARC.

Arguably the pendulum has just swung a different direction.  With UFS,
NFS, and other file systems, file system data and metadata may be
cached but you can't see how much it is helping you.  To date, it
seems as though the only way to understand when increased memory
consumption (in heap, stack, etc.) will have a significant impact on
performance due to file system caches is to actually experience the
shortfall.  This seems to be true for Solaris 8, 9, 10 w/o ZFS and 10
w/ ZFS.

A key difference between the shortfall situation with UFS vs. ZFS is
in the use of vmstat -p.  When the disks are busy pulling pages in
from UFS, the "fpi" column in "vmstat -p" clues the administrator in
on the fact that file system pages are being paged in.  This doesn't
seem to be the case with zfs.

--
Mike Gerdts
http://mgerdts.blogspot.com/
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to