[email protected] (Shmuel Metz  , Seymour J.) writes:
> It may be true for simulation of the S/370 on Intel, but a real
> 370/168 handled it in the I-unit.

re:
http://www.garlic.com/~lynn/2013f.html#65 Linear search vs. Binary search

high-end machines were horizontal microcode with lots of extra hardware
... potentially overlapping multiple operations at once. As a result it
was measured in avg. machine cycles per instruction; 370/165 was
2.1/instruction, improvements for 168 got it down to
1.6/instruction. even tho 3033 started out remapping 168 to 20% faster
chips ... some further optimization got 3033 to 50% faster than 168 and
avg. cycle/instruction to avg. of 1cycle/instruction. It also made it
hard to see any performance improvements from microcode assists on the
3033 ... in some cases, actually running slower than straight 370 (some
claims that various 3033 microcode assists were both slower and purely
existed to make it more difficult for clone processors).

the low & mid-range 370s were vertical microcode, implementation looking
much more like what is found in the x86 370 simulators ... with avg.  10
instructions per 370 instruction. early microcode assist on 370/148 got
10:1 performance improvement because 370 kernel instructions mapped to
native instructions on 1-for-1 basis. Criteria for 370/148 ECPS was that
there was 6k bytes of available microcode space ... and the highest used
kernel pathlengths were redone in native instructions. Old post with
study that measured all kernel pathlengths, sorted by percent of kernel
time. Highest used 6kbytes of kernel instructions accounted for 79.55%
of total time spent in the kernel
http://www.garlic.com/~lynn/94.html#21

dropping into microcode at 10:1 improvement resulted in reduction of
approx. 72% of processor time spent in the kernel.

I gave a number of presentations on the above at the local BAYBUNCH user
group meetings in the period that Amdahl was starting work on their
hypervisor (large part of virtual machine dropped into the machine) and
they had lots of questions and some comments about their implementation
details (eventually 3090 had to respond with PR/SM which eventually
morphs into LPAR).

One of the comments from the Amdahl group was that they had created
"macrocode" mode ... basically special case of 370 instructions for
implementation of lots of "hardware" features. One claim for "macrocode"
mode was that it was enormously easier to program than the underlying
machine horizontal microcode ... and was originally done to make it
trivial to respond to the array of microcode assists coming out of IBM.
The other comment was that "macrocode" mode didn't allow self-modifying
instructions ... and as a result ran faster than standard 370
instructions.

-- 
virtualization experience starting Jan1968, online at home since Mar1970

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to