Greg Price <gnpr...@gmail.com> added the comment:

> About the RSS memory, I'm not sure how Linux accounts the Unicode databases 
> before they are accessed. Is it like read-only memory loaded on demand when 
> accessed?

It stands for "resident set size", as in "resident in memory"; and it only 
counts pages of real physical memory. The intention is to count up pages that 
the process is somehow using.

Where the definition potentially gets fuzzy is if this process and another are 
sharing some memory.  I don't know much about how that kind of edge case is 
handled.  But one thing I think it's pretty consistently good at is not 
counting pages that you've nominally mapped from a file, but haven't actually 
forced to be loaded physically into memory by actually looking at them.

That is: say you ask for a file (or some range of it) to be mapped into memory 
for you.  This means it's now there in the address space, and if the process 
does a load instruction from any of those addresses, the kernel will ensure the 
load instruction works seamlessly.  But: most of it won't be eagerly read from 
disk or loaded physically into RAM.  Rather, the kernel's counting on that load 
instruction causing a page fault; and its page-fault handler will take care of 
reading from the disk and sticking the data physically into RAM.  So until you 
actually execute some loads from those addresses, the data in that mapping 
doesn't contribute to the genuine demand for scarce physical RAM on the 
machine; and it also isn't counted in the RSS number.


Here's a demo!  This 262392 kiB (269 MB) Git packfile is the biggest file lying 
around in my CPython directory:

$ du -k .git/objects/pack/pack-0e4acf3b2d8c21849bb11d875bc14b4d62dc7ab1.pack
262392  .git/objects/pack/pack-0e4acf3b2d8c21849bb11d875bc14b4d62dc7ab1.pack


Open it for read -- adds 100 kiB, not sure why:

$ python
Python 3.7.3 (default, Apr  3 2019, 05:39:12) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os, mmap
>>> os.system(f"grep ^VmRSS /proc/{os.getpid()}/status")
VmRSS:      9968 kB
>>> fd = 
>>> os.open('.git/objects/pack/pack-0e4acf3b2d8c21849bb11d875bc14b4d62dc7ab1.pack',
>>>  os.O_RDONLY)
>>> os.system(f"grep ^VmRSS /proc/{os.getpid()}/status")
VmRSS:     10068 kB


Map it into our address space -- RSS doesn't budge:

>>> m = mmap.mmap(fd, 0, prot=mmap.PROT_READ)
>>> m
<mmap.mmap object at 0x7f185b5379c0>
>>> len(m)
268684419
>>> os.system(f"grep ^VmRSS /proc/{os.getpid()}/status")
VmRSS:     10068 kB


Cause the process to actually look at all the data (this takes about ~10s, 
too)...

>>> sum(len(l) for l in m)
268684419
>>> os.system(f"grep ^VmRSS /proc/{os.getpid()}/status")
VmRSS:    271576 kB

RSS goes way up, by 261508 kiB!  Oddly slightly less (by ~1MB) than the file's 
size.


But wait, there's more. Drop that mapping, and RSS goes right back down (OK, 
keeps 8 kiB extra):

>>> del m
>>> os.system(f"grep ^VmRSS /proc/{os.getpid()}/status")
VmRSS:     10076 kB

... and then map the exact same file again, and it's *still* down:

>>> m = mmap.mmap(fd, 0, prot=mmap.PROT_READ)
>>> os.system(f"grep ^VmRSS /proc/{os.getpid()}/status")
VmRSS:     10076 kB

This last step is interesting because it's a certainty that the data is still 
physically in memory -- this is my desktop, with plenty of free RAM.  And it's 
even in our address space.  But because we haven't actually loaded from those 
addresses, it's still in memory only at the kernel's caching whim, and so 
apparently our process doesn't get "charged" or "blamed" for its presence there.


In the case of running an executable with a bunch of data in it, I expect that 
the bulk of the data (and of the code for that matter) winds up treated very 
much like the file contents we mmap'd in.  It's mapped but not eagerly 
physically loaded; so it doesn't contribute to the RSS number, nor to the 
genuine demand for scarce physical RAM on the machine.


That's a bit long :-), but hopefully informative.  In short, I think for us RSS 
should work well as a pretty faithful measure of the real memory consumption 
that we want to be frugal with.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue32771>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to