Re: zOS 1.13 – CPU latent demand

Scott Chapman Wed, 01 Oct 2014 06:08:50 -0700

As Martin mentioned, "unaccounted" could be multiple things. However, from the 
reports you've shown, my first guess would also be that it's likely related to 
CPU contention.


However, I would start by looking at RMF III's DELAY, PROC, and ENCLAVE (if 
we're talking about DDF threads) and looking at the address spaces or ENCLAVEs 
in question. Note that you should always expect some amount of delay. But if 
you're looking at a DB2 thread for a batch job and it shows a high amount of 
unaccounted time and you find that job on RMF III's DELAY report showing a high 
CPU delay percentage, then that's pretty good evidence that the cause of the 
unaccounted time is in fact CPU delay. Or mostly so.

Re. the question of the LPAR BUSY vs. MVS BUSY, to be honest I don't look at 
MVS Busy that much. LPAR busy usually reduces to the LPAR's dispatch time / 
total online time of the LPAR's CPUs. MVS Busy generally will reduce to online 
time - wait time / online time. The details are in the RMF Report Analysis 
book, along with some words about the relation between them. 

I'm generally more interested in looking at which workloads are waiting on CPU 
and which workloads are consuming CPU. And how much each LPAR is consuming 
(LPAR Busy). And on the RMF PP CPU report, I'm more interested in the 
distribution of the in-ready work unit queue graph. I ideally would like to see 
that top line on that chart being the longest and to be a significant 
percentage. In your case, the graph shows that about 60% of the samples show 
that this LPAR has more than 6 work units waiting to be dispatched. I'd go to 
the RMF III PROC for the time interval in question to get a feel for what 
address spaces might be waiting the most to determine if I really cared about 
that.

If the next question is "how do we improve things", that's a more difficult 
question with lots of possible answers. In general, either reduce consumption 
by finding some tuning opportunities or add resources (hardware). The latter 
may cost more than the former. Or maybe not. 

Finally, my opinion is that machines with a single CPU are more likely to be 
unhappy machines. Queue delay happens faster and more severely when you have a 
single CPU vs even two or three CPUs. The more different things a CPU has to do 
in short succession, the less efficient it's cache is. My experience (with my 
workloads) has been that more/slower CPUs generally have performed better than 
when we've had fewer/faster CPUs. Parallel sysplex (apparently not involved 
here) can magnify that situation in that a fast and slow CPU will spin for 
pretty close to the same time waiting on sync requests.

There are obviously considerations for moving to more/slower CPUs, but it's one 
of those things that definitely should be looked at during an upgrade cycle. 
Ideally, you'd bring a machine in with OOCoD and at a "shipped as" capacity and 
a "paid-for" capacity such that you could try multiple variations to see what 
works best. Unfortunately, IBM's current policy on how you pay for maintenance 
and not being able to pre-pay maintenance on OOCoD capacity makes that harder 
to do. And ISV software that inspects the CPU type can cause problems. But in 
general, playing with capacity settings via OOCoD is easy to do. 

Scott

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Re: zOS 1.13 – CPU latent demand

Reply via email to