On Tue, 18 Dec 2018 06:03:25 -0600, Elardus Engelbrecht wrote:

>Each CPU (on an IBM mainframe) consists of two halves. Both halves are 
>executing an instruction and the results are compared.
>
>If there is a difference, then the instruction is retried. If still there is a 
>difference, somehow the CPU is giving the instruction and the rest of the 
>cache to another [unoccupied] CPU and then turns itself of and announce its 
>own status to the hardware.

This was documented in the announcement for the 9672 G5 models. You can still 
find the sales manual entry at 
http://www-01.ibm.com/common/ssi/ShowDoc.wss?docURL=/common/ssi/rep_sm/6/897/ENUS9672-_h06/index.html&request_locale=en
which contains this:

<quote>
Enhanced Processor Design

All S/390 G5 Servers are provided with an enhanced processor design. Each 
Central Processor contains dual Instruction / Execution Units, which operate 
simultaneously. Results are compared, and in the event of a miscompare, 
Instruction Retry is invoked. This design simplifies checking, and virtually 
eliminates CP failures due to soft errors.

Fault Tolerant Design

Fault tolerant design allows hardware recovery to be performed, in most cases, 
totally transparent to customer operation and eliminates the need for a repair 
action, or defers a repair action to a convenient time scheduled by the 
customer. 
</quote>

It goes on to talk about processor sparing.

This is a new (at the time) kind of error checking, but error checking was part 
of the original design of the System/360. I would like to believe that the 
latest processors are designed the same way, but I don't know.

I am not familiar with earlier processors, but my impression is that it was not 
new with System/360. When computers were built using vacuum tubes, errors would 
have been commonplace.

Memory was once parity checked. That gave way to Error Checking and Correction. 
Today, in addition to ECC, IBM uses Redundant Arrays of Independent Memory 
(RAIM).

Processors have long used parity checking on the busses used to interconnect 
components. Considerable other circuitry is included in processors to detect 
errors, but I don't have any specifics.

Many of these techniques have been documented in the IBM Journal of Research 
and Development. Unfortunately, a few years ago IBM decided to hide that behind 
a paywall.

-- 
Tom Marchant

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Reply via email to