Re: Is there a source for detailed, instruction-level performance info?

Blaicher, Christopher Y. Thu, 24 Dec 2015 09:23:37 -0800

I have looked at the public documentation on the z13 and had the privilege to 
speak to some of the people behind parts of it, and it is an amazing machine.
The reason you can't say how long an instruction takes is that in many cases 
things are happening A) out of sequence; B) at the same time and C) dependent 
on cache hit ratios.
A z13 can be looking at up to about 50 instructions to see if there is anything 
it can do.  If one of those instructions is not dependent on something yet to 
be done, it will do it and hold on to the result until needed.  It also has 
many more registers than the 16 we think of, so if you have code using R1 and 
that is followed by a LHI R1,n instruction it may use R1 for the leading-in 
code and use one of the extra registers, call it register X27, to hold the 
value of the LHI.  When that value is needed the X27 register is used instead.  
The machine remembers all this.
Also, the z/13 can be working 6 instructions in parallel.
One of the big pains for the z13 is an unpredictable branch.  That is one that 
goes one way this time and the other way the next.  The machine has a lot of 
branch prediction stuff (didn't know what else to call it) in it so that it 
tries to know where it will go, but if it predicts wrong, there is a 26 cycle 
cost, and when you consider that hits at least 6 instructions, that is a 
non-trivial expense.
That brings us to cache hit ratios.  A 1/10th of a second wait may seem like 
less than a blink of an eye, it is forever in our high speed machines.  In that 
time all your code and data has probably been purged out of level 1 cache, and 
maybe out of level 2 cache.  Bringing it back into cache takes time, a few 
cycles for level 1, and a factor more for each level away from level 1.
Today it is impossible to say how long an instruction takes.  It is even 
impossible to say how long a process takes because it varies based on what is 
in cache at the time.
Another thing that effects things is do you get dispatched on the same 
processor or not.  If not, then all the level 1 cache has to be reloaded.
Bottom line, instruction speed is almost meaningless.  You have to look at it 
from a workload perspective.


Chris Blaicher
Technical Architect
Software Development
Syncsort Incorporated
50 Tice Boulevard, Woodcliff Lake, NJ 07677
P: 201-930-8234  |  M: 512-627-3803
E: [email protected]


-----Original Message-----
From: IBM Mainframe Discussion List [mailto:[email protected]] On Behalf 
Of Charles Mills
Sent: Thursday, December 24, 2015 9:51 AM
To: [email protected]
Subject: Re: Is there a source for detailed, instruction-level performance info?

Not so simple anymore.

"How long does a store halfword take?" used to be a question that had an 
answer. It no longer does.

My working rule of thumb (admittedly grossly oversimplified) is "instructions 
take no time, storage references take forever." I have heard it said that 
storage is the new DASD. This is true so much that the z13 processors implement 
a kind of "internal multiprogramming" so that one CPU internal thread can do 
something useful while another thread is waiting for a storage reference.

Here is an example of how complex it is. I am responsible for an "event" or 
transaction driven program. I of course have test programs that will run events 
through the subject software. How many microseconds does each event consume? 
One surprising factor is how fast do you push the events through.
If I max out the speed of event generation (as opposed to say, one event tenth 
of a second) then on a real-world shared Z the microseconds of CPU per event 
falls in HALF! Same exact sequence of instructions -- half the CPU time! Why? 
My presumption is that because if the program is running flat out it "owns" the 
caches and there is much less processor "wait" (for instruction and data fetch, 
not ECB type wait) time.

Charles
-----Original Message-----
From: IBM Mainframe Discussion List [mailto:[email protected]] On Behalf 
Of Thomas Kern
Sent: Wednesday, December 23, 2015 5:28 PM
To: [email protected]<mailto:[email protected]>
Subject: Re: Is there a source for detailed, instruction-level performance info?

Perhaps what might be useful would be an assembler program to run loops of 
individual instructions and output some timing information.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions, send email to 
[email protected]<mailto:[email protected]> with the message: 
INFO IBM-MAIN



  ________________________________



ATTENTION: -----

The information contained in this message (including any files transmitted with 
this message) may contain proprietary, trade secret or other confidential 
and/or legally privileged information. Any pricing information contained in 
this message or in any files transmitted with this message is always 
confidential and cannot be shared with any third parties without prior written 
approval from Syncsort. This message is intended to be read only by the 
individual or entity to whom it is addressed or by their designee. If the 
reader of this message is not the intended recipient, you are on notice that 
any use, disclosure, copying or distribution of this message, in any form, is 
strictly prohibited. If you have received this message in error, please 
immediately notify the sender and/or Syncsort and destroy all copies of this 
message in your possession, custody or control.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Re: Is there a source for detailed, instruction-level performance info?

Reply via email to