W dniu 2018-12-18 o 09:43, Timothy Sipples pisze:
Radoslaw Skorupka:
Let's say a CPU returns false results like 2x2=5. How to recognize
the result is false?
The IBM Z (and LinuxONE) system handles all that for you, and without
operating system involvement. Nowadays, thanks to the wonders of
microelectronic miniaturization, that's through intensive, thorough
integrity checking at all critical instruction execution steps baked deep
into every processor, and with tons of "transistor budget" spent on
integrity checking and other RAS characteristics. The design philosophy is
to push error handling as far down in the "stack" as possible, and that's
what actually happens.
Yes, z/OS has an amazing amount of wonderful error handling and recovery
logic, but the design philosophy (and reality) is "never" to invoke it, to
handle issues such as exceedingly rare core failures even without z/OS
having to do anything, or even necessarily to be aware anything happened.
It's a defense in depth strategy, to require multiple very long tail risks
to happen together, simultaneously, before any error surfaces to the OS for
handling.
Moreover, the system doesn't even necessarily bother notifying you that
something happened that was automatically handled with aplomb. If a stray
cosmic ray flipped a bit, triggered an integrity violation, caused an
instruction retry, and then everything continued normally for an eternity
(in processor terms) without the operating system having to do anything,
should alarm bells ring so that you can spend (waste) your time chasing
that ghost (cosmic ray)? Probably not. So there are certain categories of
anomalous, infrequent, handled, and inconsequential events that don't even
raise any system eyebrows, as it were. I don't know exactly what they are,
it probably varies by model, and IBM might not even tell you. And there's
tremendous design sense in that approach, too, because invoking some sort
of notification logic for inconsequential events could, all by itself,
cause consequential errors. There's a lot of care and long-term field
experience that goes into making these design decisions, as I understand
it. The basic analogy here is that you shouldn't yell "Fire!" in a crowded
theater (or even an uncrowded one) unless there really could be a fire,
because the very act of yelling "Fire!" could cause more harm than good.
The only currently marketed (as I write this) IBM Z or LinuxONE machine
models that can be (but certainly don't have to be) ordered and configured
without spare main processor cores are the IBM z13s (2965-N10 only) and the
IBM LinuxONE Rockhopper (2965-L10 only). However, every uncharacterized
core is a spare. You can order a single machine with 169 spare cores if you
wish. To do that you'd order an IBM z14 or IBM LinuxONE Emperor II with one
characterized core and 169 physically present but uncharacterized cores.
That's probably not a configuration you should order, but you can if you
insist.
Excellent essay!
Except ...it doesn't answer the question: HOW CPU RECOGNIZE ITS OWN
FAILURE?
--
Radoslaw Skorupka
Lodz, Poland
======================================================================
Jeśli nie jesteś adresatem tej wiadomości:
- powiadom nas o tym w mailu zwrotnym (dziękujemy!),
- usuń trwale tę wiadomość (i wszystkie kopie, które wydrukowałeś lub zapisałeś
na dysku).
Wiadomość ta może zawierać chronione prawem informacje, które może wykorzystać
tylko adresat.Przypominamy, że każdy, kto rozpowszechnia (kopiuje, rozprowadza)
tę wiadomość lub podejmuje podobne działania, narusza prawo i może podlegać
karze.
mBank S.A. z siedzibą w Warszawie, ul. Senatorska 18, 00-950
Warszawa,www.mBank.pl, e-mail: kont...@mbank.pl. Sąd Rejonowy dla m. st.
Warszawy XII Wydział Gospodarczy Krajowego Rejestru Sądowego, KRS 0000025237,
NIP: 526-021-50-88. Kapitał zakładowy (opłacony w całości) według stanu na
01.01.2018 r. wynosi 169.248.488 złotych.
If you are not the addressee of this message:
- let us know by replying to this e-mail (thank you!),
- delete this message permanently (including all the copies which you have
printed out or saved).
This message may contain legally protected information, which may be used
exclusively by the addressee.Please be reminded that anyone who disseminates
(copies, distributes) this message or takes any similar action, violates the
law and may be penalised.
mBank S.A. with its registered office in Warsaw, ul. Senatorska 18, 00-950
Warszawa,www.mBank.pl, e-mail: kont...@mbank.pl. District Court for the Capital
City of Warsaw, 12th Commercial Division of the National Court Register, KRS
0000025237, NIP: 526-021-50-88. Fully paid-up share capital amounting to PLN
169,248,488 as at 1 January 2018.
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN