[email protected] (John McKown) writes:
> There is no architectural restriction about not modifying instructions
> "on the fly". The z does not have the concept of "data" versus
> "instruction" storage. But, IMO, it is an abomination. There are two
> major reasons and one minor one. First, it causes a flush of the I
> (and D?) cache. This impacts performance quite a bit. Second, it makes
> the code not reentrant. And a minor point, due to not being reentrant,
>  is that the code cannot be placed in read-only memory. Rather than
> modifying an instruction on the fly, I either use an EX of the
> instruction, when possible; or I move a the template of the
> instruction into a data area and EX that.
> <reflection>

the claim has been made that 1/3rd of processor cycles for 370
instruction emulation went to checking for whether instruction already
fetched/decoded in the pipeline has been modified.

this applies to avg. number of x86 instructions for 370 instruction
emulation as well as the older ibm 370 microcoded machines.

long ago and far away, I was asked to look at 10 impossible things in
large (TPF) airline reservation system. I recoded an application that
was the one of the biggest processor users ... in C and retargeted for
rs/6000 (as well as changing the implementation architecture and
paradigm). I initially got 20 times improvement over TPF ... however lot
of the processing was looking at data structures that were small enough
to fit multiple per cache line. I re-organized the layout and the
instructions ... so processed multiple structures sequentially in the
same cache line (as opposed to random taking cache miss for every data
structure access) ... and got another five times improvement (for
overall 100 times improvement over TPF). I then added a whole lot for
automated processing ... which reduced things back down to only ten
times TPF (but also reduced the number of human interactions by a factor
of three times). At the end, all processing for every scheduled flight
in the world (not just for that airline) could be handled by ten
high-end rs/6000.

A decade later, this much processing was available on a smartphone.

even longer ago ... Jim Gray and I used to sit around friday nights
trying to come up with ways that might attract IBM middle management and
executives to using computers ... who at the time were almost all
totally computer illiterate. One of the things we came up with was
online telephone directory ... however the baseline was that it had to
be faster than somebody reaching for a paper phonebook ...  and the
implementation would have to take less than one week of each of our
times. Given that the approx. letter frequency was known for names, Jim
did a radix partition search which avg. out to much less I/O of binary
search (and the improvement got better as the size of the file
increased) ... aka 16,000 names is about 14 probes for binary ... but
with 50 names per record, radix partition search using first two letter
frequency ... could frequently get to the physical record in one or two
disk I/Os.

-- 
virtualization experience starting Jan1968, online at home since Mar1970

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to