On 11/4/05, Mike Gerdts <[EMAIL PROTECTED]> wrote:
> It seems as though this would make it so that the type of
> summary data that taskstat was written to collect could easily be
> intergrated into prstat by observing the kstats. Presumably this
> would mean that prstat -[aTJZ] summary lines would come from kstats
> instead of iterating over the active processes (and missing out on the
> ones that died).
It sounds like a great idea to me. Would you like to work on that? ;-)
You can download the source code for OpenSolaris, modify and build it
yourself, and I'd be happy to assist you.
- Andrei
Since my previous message, I was beginning to wonder how intrusive this really needs to be.
1) My initial thoughts...
I was looking at main.c (http://cvs.opensolaris.org/source/xref/on/usr/src/uts/common/os/main.c) and thought that a thread could be created around the time that the thread_reaper daemon thread is created. This new thread would wake up each time period and look at each process to update per zone/user/project kstats. Then in proc_exit() call would need to be made around the time of the call to exacct_commit_proc() to update the kstats for the exiting process.
Pros: This seems to be the most straight-forward way to accomplish what I am trying to accomplish. It would be possible to have multiple many user processes observing the summary data with little overhead - they would each be reading summary data. Non-privileged users would be able to observe the data via kstat.
Cons: This code would be hard to disable and changing the frequency of updates would likely be an /etc/system change and a reboot. I guess the interval could be changed by firing up mdb.
2) Then I thought...
Using this approach, it seems as though it would be tough to disable this "feature" without a reboot should it be a suspect for getting in the way. It seems as though the same could be accomplished via a kernel module that does the thread creation as part of its intialization. Perhaps getting a pointer to pid 0 would be a bit harder, but I suspect that this could be overcome. The tricky part was getting the hook into proc_exit(). I began to wonder if it would be possible for a kernel module to intercept the call to exacct_commit_proc() via a dtrace probe.
Pros: It would be easy to disable this module on a running system or to prevent it from loading in the first place. It may appear to be less intrusive because it doesn't touch main.c and exit.c, making it more palatable. Many user processes observing the collected data would have minimal performance impact. If methods other than dtrace are available for intercepting calls, this could provide per-user and per-project kstats for Solaris 8 and 9. Non-privileged users would be able to access the data via kstat.
Cons: Not clear that it would be possible to intercept proc_exit() or exacct_commit_proc() from within a kernel module.
3) Then a bit further...
Well, there is this taskstat command that runs in userspace, presumably written in C or Perl. It probably does everything except capturing the short-running processes. In order to do that, I would need to intercept the calls to exacct_commit_proc() with dtrace. Can I access dtrace probes directly from C or Perl? Or would I need to fire off a dtrace script that has a a tick probe giving appropriate summaries for exited processes and have that output piped to the C or Perl code that calls getacct().
Pros: There would be no need to touch the kernel code.
Cons: It would have to run as root or a user with elevated privileges. Multiple users observing the data could have significant impact, especially if there are thousands of processes. Possibly a mixture of languages to accomplish the task. Data would not be available via kstat.
4) And possibly over the edge...
But this would all not really be necessary if dtrace could simply wake up once per second and walk the process table all on its own. It would probably turn into about 10 lines of code to be able to write a monitoring tool that would give the requested information.
Pros: It would probably not be that much code to write such a dtrace script.
Cons: Last I looked, there did not seem to be any looping constructs to be able to walk the process list. As such, dtrace would need to have its grammar and functionality enhanced to support loops.
In summary...
I am open to others' thoughts on what the best way to approach this problem. I think I am most partial to option 2 assuming kernel modules are able to somehow intercept other kernel calls (presumably with fbt::exacct_commit_proc() probe). If option 1 turns out to be the favored mechanism, have I picked the best places to create the zone/process/user kstats thread?
Mike
_______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org