Precisely what I was thinking. Except that third column is "swapped out
processes" and how does one pull them back from the brink of never never
land? What ladel reaches into that bucket and when?
I don't rightly know.
Lets run through a thought experiment/demonstration:
Start off with a couple of different kinds of processes, and make
the assumption that a kernel thread is used for each process that is
invoked.
For this discussion, consider the following three types of process:
compute bound eg: main() { for(;;); }
resource bound eg: main() { for(;;) sleep(1); }
i/o bound eg: main() { for(;;) write(/dev/null,
read(/dev/zero))); }
If you fire off a bunch of compute bound processes, at some point there
will be
more processes ready to run than there are cpus/cores to run them on.
They all will be ready to run whenever a cpu is available. This is
r the number of kernel threads in run queue
(IIRC, this does not include processes actually running on cpus...)
The scheduler (at least in simplistic terms) does this:
event_handler["scheduling quantum expired"] => {
Take the current process and put it at the end of the run queue
Take the process off the head of the run queue and make it the
current process
}
This ensures that all processes that are "ready to run" get run at some
point.
(The details of the scheduler are the topic for some other discussion about
scheduling policies, multi-processor aware kernels, process groups and
the like...)
Now you fire off an additional set of the resource or I/O bound processes.
You still have more process to run than you have cpus, so the "r"un queue
is still in use. Sometimes these new processes are ready to run and other
times they are waiting for something asynchronous to happen - a kernel
timer to fire, an I/O operation to complete, some mutex to become unstuck,
whatever. The key point is that the kernel "knows" that the process is
waiting for something and therefore it is not "runnable" until that
something
happens. While these processes are "stuck", they are
b the number of blocked kernel threads that
are waiting for resources I/O, paging, etc
Whenever that something happens, the process is put back onto the end
of the run queue by the kernel and this narrative goes back to the above
scheduling loop.
Now, kick off a large number of these processes such that together they
require more RAM memory than your system actually contains. (The above
snippets of pseudocode are not good examples for this because they don't
actually consume much unshared memory and they don't make a mess of
what little they *do* use...) At the point where all of the system's RAM
memory is allocated, the system has to do something to make room for
a new process. When I last played in this area (don't ask :-), the
algorithm
was simple: First, look for pages that can be reloaded easily from
somewhere
else: read-only pages like those found in executable files from the file
system
are easy to toss out and read back in. If there aren't any of these easy
choices, then the system needs to choose a dirty page (one that has
content modified by a user process, such as a stack frame, data structures,
etc) and write it out to disk to make room. Once a piece of a process's
address space is paged or swapped out, the page tables are changed so
that they will generate a page fault when it tries to access it. The
process
will continue to run as usual (bouncing between running, being on the
run queue and being blocked for resources) until it tries to access one of
its pages that are no longer in memory. At that point, IIRC, the process
ends up as:
w the number of swapped out light-
weight processes waiting ...
The kernel makes arrangements for the proper disk blocks to be read back
in and mapped to the proper VM page tables, but those operations take lots
of time to complete, relative to the instruction cycle time of the
CPU. When
the requested data is back in memory, the process is removed from the "w"ait
queue and put back on the "r"un queue.
At what point does Solaris push back and say "no more, I'm busy" ? I am
now
You answered that in your follow up email:
* Description : max_nprocs
* Maximum number of processes that can be created on a system.
* Description : maxuprc
* Maximum number of processes that can be created on a system
* by any one user.
The pushback _is_ a bit harsh - it says "no more" instead of "please
wait"...
Up to that point, processes get created and added to the "r" queue to be
managed
by the scheduler and the VM system.
The dynamics of your testbed come into play - fork/execing a shell to
fork/exec a prioctl which exec's unzip which does a bunch of file I/O....
Your examples are not nearly as simple as the pseudo code I gave above :-)
It sounds like a perfect place to play with DTrace, which should help paint
a very pretty picture of what exactly is happening under the hood.
-John
_______________________________________________
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org