On Thu, Jun 25, 2009 at 1:27 PM, <johan...@sun.com> wrote: > > On Thu, Jun 25, 2009 at 11:16:50AM -0700, Sean Liu wrote: > > Now that ZFS cache will eat up all the free mem for cache, again > > that's all fine. But vmstat doesn't report meaningful free memory > > again. Yes memstat can tell you the ZFS file data size but who wants > > to run mdb every now and then? > > > > Can we introduce some new kernel variable the way we introduced > > cachelist and make vmstat free memory report meaningful again? > > I disagree with the premise that the free memory calculation isn't > useful because it doesn't count ZFS caches as free space. They aren't > free space. The memory is in use by the system, and unless it's able to > reduce the size of the cache, that memory isn't going to be availble to > your application. The ZFS caches are in kmem, and are maintained > differently than the freelist and cachelist, which are actually lists of > unused pages. Pages in the ARC are mapped by segkmem and allocated > through the kmem cache; as far as the system is concerned, these pages > are in use. > > Since kmem has to free all of the objects allocated from a slab before > it can return the space to vmem, heap fragmentation, and subsystems that > don't have reliable memory release callbacks can make it difficult for > kmem to reclaim unused space. If you're on a ONNV kernel, get in mdb > -k, and run ::kmem_slabs to see how much space is wasted due to > fragmentation in a particular cache. > > The space that's consumed by ZFS is in use and can't simply be peeled > off a free-list and used immediately. Adding a variable that counts > space used by ZFS as free is actually going to be more confusing.
But... Suppose that I have a machine with 8 GB of memory and 12 GB of stuff on disk. There is a trivial web server running that serves up some static pages now and then. For argument's sake, the working set of this web server is 1 GB. Due to a full backup that happened a week or a month ago, somewhere between 6 & 8 GB of file system contents are in the ARC. Most administrators, I think, would like to have at least the non-active (not used recently - whatever that means) appear as free memory. As it is, anyone that looks at free memory reported by vmstat or similar will see that there is no available memory to fire up a J2EE app that will require 2 GB of memory as it pushes bits between a web front-end and a database that are on different hosts. In reality, the app could be started and run with no real difference in system performance (or free memory). Observability of zfs memory usage is important. Unless one digs into kstats or ::memstat, memory used by ZFS for caching is unobservable. Since most administrators don't know how to use such tools, I consider ZFS caching to be unobservable to most. Even with kstats or ::memstat, there is either a lack of available data or missing documentation to understand whether performance would suffer by adding applications to a system that has a significant amount of memory used by the ARC. Arguably the pendulum has just swung a different direction. With UFS, NFS, and other file systems, file system data and metadata may be cached but you can't see how much it is helping you. To date, it seems as though the only way to understand when increased memory consumption (in heap, stack, etc.) will have a significant impact on performance due to file system caches is to actually experience the shortfall. This seems to be true for Solaris 8, 9, 10 w/o ZFS and 10 w/ ZFS. A key difference between the shortfall situation with UFS vs. ZFS is in the use of vmstat -p. When the disks are busy pulling pages in from UFS, the "fpi" column in "vmstat -p" clues the administrator in on the fact that file system pages are being paged in. This doesn't seem to be the case with zfs. -- Mike Gerdts http://mgerdts.blogspot.com/ _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org