Re: [perf-discuss] CPU data per project and/or zone?

Brendan Gregg Wed, 02 Nov 2005 22:37:07 -0800

G'Day Mike,

On Wed, 2 Nov 2005, Mike Gerdts wrote:


> One of the more difficult performance monitoring problems that I have
> come across is determining the impact of multiple workloads running on
> a server.  Consider a server that has about 1000 database processes
> that are long running - many minutes to many months - mixed with batch
> jobs written in Bourne shell.  Largely due to the batch jobs, it is
> not uncommon for sar to report hundreds of forks and execs per second.
>
> There is somewhat of a knee-jerk reaction to move the batch jobs off
> of the database server.  Howerver, quantifying how much of an impact
> this would have is somewhat hard to do.  Trying to use "prstat -a" or
> "prstat -J" does not seem to give a very accurate picture.  My guess
> is that prstat will tend to miss out on all of the processes that were
> very short lived.

procfs based tools miss out on short lived processes (as a seperate
process entry anyway) due to sampling. "prstat -m" can do something
spooky with short lived processes, for example,

   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP
  6394 root      14  71 0.1 0.0 0.0 0.0 0.0  15   0 593 35K   0 /0
   937 root     2.2 8.4 0.0 0.0 0.0 0.0 4.2  85  26  24 11K 428 bash/1

PID 6394 has no name and no LWPs, a ghost process. (And a fairly busy one
too, 35000 syscalls ... Actually, I shouldn't criticise this as it may
well be a deliberate aggregation of short lived processes, and I've found
it to be quite handy. Thank you ghost process!)

Now, the by-child usr/sys times from procfs should give us a crack at
solving this, they can be fetched using the -b option of,
        http://www.brendangregg.com/Solaris/prusage
however I suspect they undercount usr/sys time. (Now that opensolaris
code is public I ought to go and read how they are incremented)...

> The best solution that I have come up with is to write extended
> accounting records (task) every few minutes, then to process the
> exacct file afterwards.  Writing the code to write exacct records
> periodically and make sense of them later is far from trivial.  It is
> also impractical for multiple users (monitoring frameworks,
> administrators, etc.) to make use of this approach on the same machine
> at the same time due to the fact that the exacct records need to be
> written and this is presumably a somewhat expensive operation to do
> too often.

Wow, are you baiting me to talk about DTrace? ;-)

Actually, nice idea with exaccts, there are few other things to try in
previous Solaris to shed light on this problem (TNF tracing, BSM
auditing...)

Anyway, for SHORT lived processes use shortlived.d from the DTraceToolkit,

 # shortlived.d
 Sampling.. Hit Ctrl-C to stop.
 ^C
 short lived processes:      0.456 secs
 total sample duration:      9.352 secs

 Total time by process name,
               date           12 ms
                 df           20 ms
                 ls           40 ms
               perl          380 ms

 Total time by PPID,
               3279          452 ms

I guess I should print out percentages as well. 0.4 from 9.3 seconds is
around 5%... Also run execsnoop as it's entertaining.

shortlived.d is handy for troubleshooting the problem in person, however
if you want to write a deamon that permanently logs this info you'll need
to modify it somewhat. It doesn't enable probes that are likely to be
terribly rapid, and so the performance impact should be negligible. My
biggest worry would be the DTrace daemon dying due to unresponsiveness,
and you missing out on log entries (better wrap it in SMF). Although it's
not really a big worry, as it only usually happens after you kill -23
DTrace.

> It seems as though it should be possible for the kernel to maintain
> per-user, per-project, and per-zone statistics.  Perhaps collecting
> them all the time is not desirable, but it seems as though updating
> the three sets of statistics for each context switch would be lighter
> weight than writing accounting records then post processing them.  The
> side affect of having this data available would be that tools like
> prstat could report accurate data.  Other tools could likely get this
> data through kstat or a similar interface.

Kstat currently provides a number of CPU related goodies. See
/usr/include/sys/sysinfo.h for the cpu_* structs.

Alan Hargreaves (from memory) posted the following,

        http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6199092

which suggests seperating the cpu_* structs from the CPUs, so that they
can be used to track many other categories, such as by zone.

A number of scripts from the DTraceToolkit already provide per zone
statistics, since it's trivial to retrieve. Eg, zvmstat.

I think the bottom line is it depends on what details you are interested
in. procfs already has project and zone info and a swag of resource
counters, so an ordinary procfs tool may be a solution (enhancement to
prstat/ps)..

cheers,

Brendan

[Sydney, Australia]

_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Re: [perf-discuss] CPU data per project and/or zone?

Reply via email to