On 14/04/2012 12:24 AM, Tony Harminc wrote:
<snip>
But it may be that when writing high performance assembler routines it is now a lot harder to win a battle with a compiler that has advanced knowledge of the underlying machine internals.

Tony H. ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN

The writing was on the wall a long time ago which was documented in the redbooks http://www.redbooks.ibm.com/redbooks/pdfs/sg246515.pdf.

<quote>
There is one programming aspect that is relevant, although only slightly linked to the use of a split cache. For many years, it has been an axiom among S/360 - S/390 users that assembly language programmers probably produce faster code than high-level language compilers. This is no longer true. Processors that use pipelines (including z800 and z900 machines) require a certain amount of nonsequential code to obtain the best performance. For example, if an instruction loads a register and the next instruction uses the register, we do not have optimum code. This sequence will stall the pipeline for several processor cycles. (The instructions work correctly, of course, but they take longer than necessary.) The best technique is to interleave several unrelated instructions between loading a register and using
the new contents of the register.
</quote>

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to