On Tue, 18 Dec 2018 06:03:25 -0600, Elardus Engelbrecht wrote: >Each CPU (on an IBM mainframe) consists of two halves. Both halves are >executing an instruction and the results are compared. > >If there is a difference, then the instruction is retried. If still there is a >difference, somehow the CPU is giving the instruction and the rest of the >cache to another [unoccupied] CPU and then turns itself of and announce its >own status to the hardware.
This was documented in the announcement for the 9672 G5 models. You can still find the sales manual entry at http://www-01.ibm.com/common/ssi/ShowDoc.wss?docURL=/common/ssi/rep_sm/6/897/ENUS9672-_h06/index.html&request_locale=en which contains this: <quote> Enhanced Processor Design All S/390 G5 Servers are provided with an enhanced processor design. Each Central Processor contains dual Instruction / Execution Units, which operate simultaneously. Results are compared, and in the event of a miscompare, Instruction Retry is invoked. This design simplifies checking, and virtually eliminates CP failures due to soft errors. Fault Tolerant Design Fault tolerant design allows hardware recovery to be performed, in most cases, totally transparent to customer operation and eliminates the need for a repair action, or defers a repair action to a convenient time scheduled by the customer. </quote> It goes on to talk about processor sparing. This is a new (at the time) kind of error checking, but error checking was part of the original design of the System/360. I would like to believe that the latest processors are designed the same way, but I don't know. I am not familiar with earlier processors, but my impression is that it was not new with System/360. When computers were built using vacuum tubes, errors would have been commonplace. Memory was once parity checked. That gave way to Error Checking and Correction. Today, in addition to ECC, IBM uses Redundant Arrays of Independent Memory (RAIM). Processors have long used parity checking on the busses used to interconnect components. Considerable other circuitry is included in processors to detect errors, but I don't have any specifics. Many of these techniques have been documented in the IBM Journal of Research and Development. Unfortunately, a few years ago IBM decided to hide that behind a paywall. -- Tom Marchant ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN