[email protected] (Shmuel Metz , Seymour J.) writes: > It may be true for simulation of the S/370 on Intel, but a real > 370/168 handled it in the I-unit.
re: http://www.garlic.com/~lynn/2013f.html#65 Linear search vs. Binary search high-end machines were horizontal microcode with lots of extra hardware ... potentially overlapping multiple operations at once. As a result it was measured in avg. machine cycles per instruction; 370/165 was 2.1/instruction, improvements for 168 got it down to 1.6/instruction. even tho 3033 started out remapping 168 to 20% faster chips ... some further optimization got 3033 to 50% faster than 168 and avg. cycle/instruction to avg. of 1cycle/instruction. It also made it hard to see any performance improvements from microcode assists on the 3033 ... in some cases, actually running slower than straight 370 (some claims that various 3033 microcode assists were both slower and purely existed to make it more difficult for clone processors). the low & mid-range 370s were vertical microcode, implementation looking much more like what is found in the x86 370 simulators ... with avg. 10 instructions per 370 instruction. early microcode assist on 370/148 got 10:1 performance improvement because 370 kernel instructions mapped to native instructions on 1-for-1 basis. Criteria for 370/148 ECPS was that there was 6k bytes of available microcode space ... and the highest used kernel pathlengths were redone in native instructions. Old post with study that measured all kernel pathlengths, sorted by percent of kernel time. Highest used 6kbytes of kernel instructions accounted for 79.55% of total time spent in the kernel http://www.garlic.com/~lynn/94.html#21 dropping into microcode at 10:1 improvement resulted in reduction of approx. 72% of processor time spent in the kernel. I gave a number of presentations on the above at the local BAYBUNCH user group meetings in the period that Amdahl was starting work on their hypervisor (large part of virtual machine dropped into the machine) and they had lots of questions and some comments about their implementation details (eventually 3090 had to respond with PR/SM which eventually morphs into LPAR). One of the comments from the Amdahl group was that they had created "macrocode" mode ... basically special case of 370 instructions for implementation of lots of "hardware" features. One claim for "macrocode" mode was that it was enormously easier to program than the underlying machine horizontal microcode ... and was originally done to make it trivial to respond to the array of microcode assists coming out of IBM. The other comment was that "macrocode" mode didn't allow self-modifying instructions ... and as a result ran faster than standard 370 instructions. -- virtualization experience starting Jan1968, online at home since Mar1970 ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN
