Hi Ben,

Ben Rockwood wrote:
m...@bruningsystems.com wrote:
Hi Ben,
Ben Rockwood wrote:
I'm curious as to why memory statistics seems to be very difficult to be
accurate about.  If you use kstats, mdb ::memstat, and add up VSZ/RSS
from ps, you get numbers that are different, although close.

Can anyone shed some light on why this is?  I'm assumed that ::memstat
is the most accurate measure and I'm comparing my numbers against it,
but perhaps it is not the best validation?
Have you tried getting the numbers on a crash dump?  If you are
doing this on a running system, I would expect the numbers to
fluctuate.

No, I'm not interested in getting so exact that the numbers don't matter
because the system isn't running. :)

Here is an example:

::memstat
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                     377207              1473    4%
Anon                        37794               147    0%
Exec and libs                5055                19    0%
Page cache                  12200                47    0%
Free (cachelist)            18828                73    0%
Free (freelist)           7934179             30992   95%

Total                     8385263             32754
Physical                  8385262             32754

zfs:0:arcstats:size             16504880                <--- 16,504,880
bytes
# kstat -p | grep -i system_pages
unix:0:system_pages:availrmem   8005191
unix:0:system_pages:class       pages
unix:0:system_pages:crtime      0
unix:0:system_pages:desfree     65509
unix:0:system_pages:desscan     25
unix:0:system_pages:econtig     4224880640
unix:0:system_pages:fastscan    1681006
unix:0:system_pages:freemem     7954235
unix:0:system_pages:kernelbase  0
unix:0:system_pages:lotsfree    131019
unix:0:system_pages:minfree     32754
unix:0:system_pages:nalloc      25203900
unix:0:system_pages:nalloc_calls        14944
unix:0:system_pages:nfree       23916528
unix:0:system_pages:nfree_calls 9475
unix:0:system_pages:nscan       0
unix:0:system_pages:pagesfree   7954235         <--- 31,816,940  (31071 MB)
unix:0:system_pages:pageslocked 380071
unix:0:system_pages:pagestotal  8385262
unix:0:system_pages:physmem     8385263         <--- 33,541,052  (32754 MB)
unix:0:system_pages:pp_kernel   379926          <---  1,519,704k (1484  MB)
unix:0:system_pages:slowscan    100
unix:0:system_pages:snaptime    2181598.72132125

So if we look at pages for Kernel, kstat pp_kernel says 379926 but
::memstat says 377207.  Of free, kstat pagesfree says 7954235 while
::memstat says 7934179.  The numbers are very close (within about 100MB
on a system with 32GB of memory) but not exact.
The pagesfree value that kstats reports is the same variable used by ::memstat for free memory. Both of these are the freemem variable. freemem is updated
every clock tick.
If you try to work out "used" memory by, for instance, taking total mem,
subtracting free mem, then dividing it between kernel and user you
similarly get inconsistence numbers depending on where you get your
numbers (::memstat, vs kstat, vs adding up ps RSS numbers).
Based on the way the ::memstat works, it should be the most accurate value.
However, it makes multiple passes through all pages of pageable memory, so things
can change during the passes.  In fact, this could happen even if it only
made a single pass.  What I find a little more interesting is the following.
First, ::memstat output...

::memstat
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                     110120               430   21%
Anon                       192334               751   37%
Exec and libs               26664               104    5%
Page cache                   2254                 8    0%
Free (cachelist)            32993               128    6%
Free (freelist)            157473               615   30%

Total                      521838              2038
Physical                   521837              2038

memstat uses the page walker to dump everything up to Free (freelist).
Here is a check on the page list for free pages

::walk page | ::print page_t p_state ! egrep '80|90|c0|a0' | wc
32985 98955 494775 <-- 32985 free pages (almost exact with Free (cachelist))

So, how many pages on on the page list...

::walk page ! wc  <-- this is the walker ::memstat uses to examine pages
 367695  367695 3309255   <-- so 367695 page_t (i.e., pageable pages)

however,

physmem::print -d           <-- variable used for Total in ::memstat
0t521838                          <-- but 521838 pages of physical memory

So, where are the 154143 pages? (This is about 600MB of memory on my 2GB machine).

It turns out that the page walker only walks pages that are "hashed", i.e., have vnode/offset identity. A page that does not have an identity is not listed. (To see all pages, you can use ::memseg_list and go from there... left as an exercise. I've done this and now
get a total number of pages in agreement with physmem.).

To see pages used by the kernel (not counting zfs), you can do:
::walk page | ::print page_t p_vnode !grep kvp | wc
116728 350184 1634192 <-- so 116728 kernel pages (note that this was done a while after the
                                            <-- ::memstat above)

Pages used for zfs data may use the zvp vnode (but not on my build), so
you can use the above walker and substitute zvp for kvp.

I assume you are using something like "ps -e -o rss,comm" to dump RSS numbers for processes. Remember that many of the pages are shared between processes.


This only really causes trouble because if you try to offer a sysadmin a
breakdown of memory it is:

1) Approx within maybe 5% of reality
2) Must be pulled from a single source (kstat?) or the numbers don't
even line up as approx

And the problem is that this is hard for an end user to swallow... if
they attempt to check your math they'll see its not right and discard
the data as useless.

Thus, perhaps the best take away is that this is why there is no Solaris
"memstat" tool for the CLI short of using mdb, and that the more
accurate observations are based on how memory is changing
(vmstat/mpstat) rather than absolutely what it is at any given point
(::memstat).
This I agree with.
Am I accurate here or out to lunch?
uhhh... yes(?)
max

_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to