I have looked at the public documentation on the z13 and had the privilege to speak to some of the people behind parts of it, and it is an amazing machine. The reason you can't say how long an instruction takes is that in many cases things are happening A) out of sequence; B) at the same time and C) dependent on cache hit ratios. A z13 can be looking at up to about 50 instructions to see if there is anything it can do. If one of those instructions is not dependent on something yet to be done, it will do it and hold on to the result until needed. It also has many more registers than the 16 we think of, so if you have code using R1 and that is followed by a LHI R1,n instruction it may use R1 for the leading-in code and use one of the extra registers, call it register X27, to hold the value of the LHI. When that value is needed the X27 register is used instead. The machine remembers all this. Also, the z/13 can be working 6 instructions in parallel. One of the big pains for the z13 is an unpredictable branch. That is one that goes one way this time and the other way the next. The machine has a lot of branch prediction stuff (didn't know what else to call it) in it so that it tries to know where it will go, but if it predicts wrong, there is a 26 cycle cost, and when you consider that hits at least 6 instructions, that is a non-trivial expense. That brings us to cache hit ratios. A 1/10th of a second wait may seem like less than a blink of an eye, it is forever in our high speed machines. In that time all your code and data has probably been purged out of level 1 cache, and maybe out of level 2 cache. Bringing it back into cache takes time, a few cycles for level 1, and a factor more for each level away from level 1. Today it is impossible to say how long an instruction takes. It is even impossible to say how long a process takes because it varies based on what is in cache at the time. Another thing that effects things is do you get dispatched on the same processor or not. If not, then all the level 1 cache has to be reloaded. Bottom line, instruction speed is almost meaningless. You have to look at it from a workload perspective.
Chris Blaicher Technical Architect Software Development Syncsort Incorporated 50 Tice Boulevard, Woodcliff Lake, NJ 07677 P: 201-930-8234 | M: 512-627-3803 E: [email protected] -----Original Message----- From: IBM Mainframe Discussion List [mailto:[email protected]] On Behalf Of Charles Mills Sent: Thursday, December 24, 2015 9:51 AM To: [email protected] Subject: Re: Is there a source for detailed, instruction-level performance info? Not so simple anymore. "How long does a store halfword take?" used to be a question that had an answer. It no longer does. My working rule of thumb (admittedly grossly oversimplified) is "instructions take no time, storage references take forever." I have heard it said that storage is the new DASD. This is true so much that the z13 processors implement a kind of "internal multiprogramming" so that one CPU internal thread can do something useful while another thread is waiting for a storage reference. Here is an example of how complex it is. I am responsible for an "event" or transaction driven program. I of course have test programs that will run events through the subject software. How many microseconds does each event consume? One surprising factor is how fast do you push the events through. If I max out the speed of event generation (as opposed to say, one event tenth of a second) then on a real-world shared Z the microseconds of CPU per event falls in HALF! Same exact sequence of instructions -- half the CPU time! Why? My presumption is that because if the program is running flat out it "owns" the caches and there is much less processor "wait" (for instruction and data fetch, not ECB type wait) time. Charles -----Original Message----- From: IBM Mainframe Discussion List [mailto:[email protected]] On Behalf Of Thomas Kern Sent: Wednesday, December 23, 2015 5:28 PM To: [email protected]<mailto:[email protected]> Subject: Re: Is there a source for detailed, instruction-level performance info? Perhaps what might be useful would be an assembler program to run loops of individual instructions and output some timing information. ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected]<mailto:[email protected]> with the message: INFO IBM-MAIN ________________________________ ATTENTION: ----- The information contained in this message (including any files transmitted with this message) may contain proprietary, trade secret or other confidential and/or legally privileged information. Any pricing information contained in this message or in any files transmitted with this message is always confidential and cannot be shared with any third parties without prior written approval from Syncsort. This message is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any use, disclosure, copying or distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or Syncsort and destroy all copies of this message in your possession, custody or control. ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN
