G'Day Mike, On Wed, 2 Nov 2005, Mike Gerdts wrote:
> One of the more difficult performance monitoring problems that I have > come across is determining the impact of multiple workloads running on > a server. Consider a server that has about 1000 database processes > that are long running - many minutes to many months - mixed with batch > jobs written in Bourne shell. Largely due to the batch jobs, it is > not uncommon for sar to report hundreds of forks and execs per second. > > There is somewhat of a knee-jerk reaction to move the batch jobs off > of the database server. Howerver, quantifying how much of an impact > this would have is somewhat hard to do. Trying to use "prstat -a" or > "prstat -J" does not seem to give a very accurate picture. My guess > is that prstat will tend to miss out on all of the processes that were > very short lived. procfs based tools miss out on short lived processes (as a seperate process entry anyway) due to sampling. "prstat -m" can do something spooky with short lived processes, for example, PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP 6394 root 14 71 0.1 0.0 0.0 0.0 0.0 15 0 593 35K 0 /0 937 root 2.2 8.4 0.0 0.0 0.0 0.0 4.2 85 26 24 11K 428 bash/1 PID 6394 has no name and no LWPs, a ghost process. (And a fairly busy one too, 35000 syscalls ... Actually, I shouldn't criticise this as it may well be a deliberate aggregation of short lived processes, and I've found it to be quite handy. Thank you ghost process!) Now, the by-child usr/sys times from procfs should give us a crack at solving this, they can be fetched using the -b option of, http://www.brendangregg.com/Solaris/prusage however I suspect they undercount usr/sys time. (Now that opensolaris code is public I ought to go and read how they are incremented)... > The best solution that I have come up with is to write extended > accounting records (task) every few minutes, then to process the > exacct file afterwards. Writing the code to write exacct records > periodically and make sense of them later is far from trivial. It is > also impractical for multiple users (monitoring frameworks, > administrators, etc.) to make use of this approach on the same machine > at the same time due to the fact that the exacct records need to be > written and this is presumably a somewhat expensive operation to do > too often. Wow, are you baiting me to talk about DTrace? ;-) Actually, nice idea with exaccts, there are few other things to try in previous Solaris to shed light on this problem (TNF tracing, BSM auditing...) Anyway, for SHORT lived processes use shortlived.d from the DTraceToolkit, # shortlived.d Sampling.. Hit Ctrl-C to stop. ^C short lived processes: 0.456 secs total sample duration: 9.352 secs Total time by process name, date 12 ms df 20 ms ls 40 ms perl 380 ms Total time by PPID, 3279 452 ms I guess I should print out percentages as well. 0.4 from 9.3 seconds is around 5%... Also run execsnoop as it's entertaining. shortlived.d is handy for troubleshooting the problem in person, however if you want to write a deamon that permanently logs this info you'll need to modify it somewhat. It doesn't enable probes that are likely to be terribly rapid, and so the performance impact should be negligible. My biggest worry would be the DTrace daemon dying due to unresponsiveness, and you missing out on log entries (better wrap it in SMF). Although it's not really a big worry, as it only usually happens after you kill -23 DTrace. > It seems as though it should be possible for the kernel to maintain > per-user, per-project, and per-zone statistics. Perhaps collecting > them all the time is not desirable, but it seems as though updating > the three sets of statistics for each context switch would be lighter > weight than writing accounting records then post processing them. The > side affect of having this data available would be that tools like > prstat could report accurate data. Other tools could likely get this > data through kstat or a similar interface. Kstat currently provides a number of CPU related goodies. See /usr/include/sys/sysinfo.h for the cpu_* structs. Alan Hargreaves (from memory) posted the following, http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6199092 which suggests seperating the cpu_* structs from the CPUs, so that they can be used to track many other categories, such as by zone. A number of scripts from the DTraceToolkit already provide per zone statistics, since it's trivial to retrieve. Eg, zvmstat. I think the bottom line is it depends on what details you are interested in. procfs already has project and zone info and a swag of resource counters, so an ordinary procfs tool may be a solution (enhancement to prstat/ps).. cheers, Brendan [Sydney, Australia] _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org