On 14/04/2012 12:24 AM, Tony Harminc wrote:
<snip>
But it may be that when writing high performance assembler routines it
is now a lot harder to win a battle with a compiler that has advanced
knowledge of the underlying machine internals.
Tony H.
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions, send
email to [email protected] with the message: INFO IBM-MAIN
The writing was on the wall a long time ago which was documented in the
redbooks http://www.redbooks.ibm.com/redbooks/pdfs/sg246515.pdf.
<quote>
There is one programming aspect that is relevant, although only slightly
linked to the use of a
split cache. For many years, it has been an axiom among S/360 - S/390
users that assembly
language programmers probably produce faster code than high-level
language compilers.
This is no longer true. Processors that use pipelines (including z800
and z900 machines)
require a certain amount of nonsequential code to obtain the best
performance. For example,
if an instruction loads a register and the next instruction uses the
register, we do not have
optimum code. This sequence will stall the pipeline for several
processor cycles. (The
instructions work correctly, of course, but they take longer than
necessary.) The best
technique is to interleave several unrelated instructions between
loading a register and using
the new contents of the register.
</quote>
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN