This microstate represents the percentage of time the threads were sitting
on a run queue (runnable) waiting for their turn on a CPU. The BIBusTKServer
processes are either running in USR mode (while on a CPU) or waiting for
a CPU (for the most part). Each process has several threads (on the order of
20), so each prstat row represents the CPU wait time for all the threads
in the
process. Each server is also doing about 3k-5k system calls per second, but
that must be non-IO syscalls, since there's no appreciable SYS time or
SLP time.
So....what does the "r" column in vmstat look like? That's the
system-wide view
of run queue depth. Is is consistently at or over 100?
How many CPUs are on this system?
Your question is a good one, but it's very difficult to predict how much the
addition of CPUs (or a change to faster CPUs) will ultimately impact
the statistics, and, much more importantly, the application performance.
Which brings me to the next question - is the application performing well,
or are you chasing a performance problem?
Not knowing anything more about the workload, it certainly seems obvious
that
more processors will help, in terms of reducing CPU wait time. When that
happens, more threads can run concurrently, and how that ultimately helps
(or hurts) throughput depends on application design, locking and
dependencies.
My knee-jerk reaction is that I would want at least a 16-way system to
run the
BIBusTKServer processes. Looks like a decent candidate for the T2000
(32 threads), assuming it meets other contraints (next to zero floating
point,
for example).
- What system is this running on now?
- What version of Solaris?
- How is overall workload/application performance, currently?
Theoretically, with enough CPUs (cores) to run each thread, LAT time
would drop to near zero - the problem is we don't know if all those
running threads will contend on something else (at least I don't :^).
It's unrealistic (probably) to suggest 160 or so cores!
Also, does the workload require 8 threaded BIBusTKServer processes?
Not knowing anything about the design, it's possible you could run
fewer Server processes that would reduce CPU contention while
maintaining throughput (whether that's possible or not is of course
completely dependent on the application design).
HTH - please follow up with us on the questions about, and let's
see where it goes.
/jim
Glen Gunselman wrote:
We have an overloaded server (V490 with one CPU board) - CPU bound.
Here is a sample prstat -mL taken during a time of high load(uptime
Total: 278 processes, 1710 lwps, load averages: 20.72, 13.21, 6.74):
PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG
PROCESS/LWPID
5617 cognos8 53 0.5 0.0 0.0 0.0 2.1 0.0 45 1K 200 3K 0
BIBusTKServe/18
5617 cognos8 51 0.5 0.0 0.0 0.0 3.6 0.0 45 1K 274 3K 0
BIBusTKServe/17
6084 cognos8 43 0.6 0.0 0.0 0.0 1.9 0.0 54 2K 222 5K 0
BIBusTKServe/20
6084 cognos8 43 0.6 0.0 0.0 0.0 1.1 0.0 55 1K 244 4K 0
BIBusTKServe/15
6084 cognos8 39 0.6 0.0 0.0 0.0 1.8 0.0 59 2K 212 4K 0
BIBusTKServe/22
5617 cognos8 39 0.4 0.0 0.0 0.0 1.4 0.0 59 1K 223 3K 0
BIBusTKServe/22
6084 cognos8 35 0.4 0.0 0.0 0.0 1.1 0.0 64 1K 262 2K 0
BIBusTKServe/19
5617 cognos8 34 0.4 0.0 0.0 0.0 2.2 0.0 64 1K 465 2K 0
BIBusTKServe/23
29514 oracle 28 1.2 0.1 0.0 0.0 0.0 8.6 62 217 990 899 0 oracle/1
29948 root 2.4 0.4 0.0 0.0 0.0 0.0 77 20 109 561 961 0 cfagent/1
5610 oracle 1.5 0.5 0.0 0.0 0.0 0.0 98 0.1 3 8 871 0 oracle/1
942 oracle 1.2 0.6 0.0 0.0 0.0 0.0 98 0.0 15 50 506 0 oracle/1
9378 root 0.4 1.1 0.1 0.0 0.0 0.0 98 0.9 40 9 994 0 prstat/1
1475 oracle 1.1 0.2 0.4 0.0 0.0 0.0 98 0.2 111 55 945 0
emagent/3047304
11646 oracle 0.8 0.0 0.0 0.0 0.0 0.0 91 8.7 1 45 80 0 java/56
11479 oracle 0.6 0.1 0.0 0.0 0.0 0.0 98 1.0 4 4 615 0 oracle/1
10520 oracle 0.6 0.0 0.0 0.0 0.0 0.0 98 1.4 5 0 45 5
nmccollector/1
835 sysnav 0.1 0.2 0.1 0.0 0.0 0.0 57 42 19 240 471 0
bb-local.sh/1
7375 oracle 0.2 0.0 0.0 0.0 0.0 0.0 100 0.0 9 3 192 0 oracle/1
11712 oracle 0.2 0.0 0.0 0.0 0.0 0.0 100 0.0 8 2 178 0 oracle/1
11815 oracle 0.2 0.0 0.0 0.0 0.0 100 0.0 0.2 1 3 18 0 java/37
576 root 0.1 0.1 0.0 0.0 0.0 0.0 100 0.1 331 1 1K 0 nscd/11
17855 oracle 0.1 0.0 0.0 0.0 0.0 100 0.0 0.1 5 0 5 ; 0 java/2
11805 oracle 0.1 0.1 0.0 0.0 0.0 0.0 96 3.8 4 7 62 2 perl/1
11649 oracle 0.1 0.0 0.0 0.0 0.0 0.0 100 0.0 9 0 118 0 oracle/1
11780 oracle 0.0 0.1 0.0 0.0 0.0 0.0 92 8.3 52 0 354 47
webcached/1
1 root 0.0 0.1 0.0 0.0 0.0 0.0 100 0.2 13 0 361 14 init/1
4987 cognos8 0.0 0.1 0.0 0.0 0.0 0.0 57 43 338 4 232 0 java/5
4972 cognos8 0.1 0.0 0.0 0.0 0.0 0.0 91 8.5 68 0 77 0
cogbootstrap/3
17855 oracle 0.0 0.1 0.0 0.0 0.0 0.0 51 49 312 2 209 0 java/5
From looking at the LAT column how to I compute the CPU resources
needed to reduce LAT to more "normal levels".
Page 24 of Solaris Performance and Tools includes the following
statement referring to LAT:
"This is an extremely useful metric--we can use it to estimate the
potential speedup for a thread if more CPU resources are added ..."
I have been unable to find any information on how to turn LAT into CPU
resources. I'm reluctant to use USR + SYS (370.5 the top 9 processes)
+ LAT (507 for the same top 9 processes) / 100. This seems way too
simple.
Thanks
gleng
Glen Gunselman
Systems Software Specialist
TCS
Emporia State University
------------------------------------------------------------------------
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org