Radoslaw Skorupka: >Let's say a CPU returns false results like 2x2=5. How to recognize >the result is false?
The IBM Z (and LinuxONE) system handles all that for you, and without operating system involvement. Nowadays, thanks to the wonders of microelectronic miniaturization, that's through intensive, thorough integrity checking at all critical instruction execution steps baked deep into every processor, and with tons of "transistor budget" spent on integrity checking and other RAS characteristics. The design philosophy is to push error handling as far down in the "stack" as possible, and that's what actually happens. Yes, z/OS has an amazing amount of wonderful error handling and recovery logic, but the design philosophy (and reality) is "never" to invoke it, to handle issues such as exceedingly rare core failures even without z/OS having to do anything, or even necessarily to be aware anything happened. It's a defense in depth strategy, to require multiple very long tail risks to happen together, simultaneously, before any error surfaces to the OS for handling. Moreover, the system doesn't even necessarily bother notifying you that something happened that was automatically handled with aplomb. If a stray cosmic ray flipped a bit, triggered an integrity violation, caused an instruction retry, and then everything continued normally for an eternity (in processor terms) without the operating system having to do anything, should alarm bells ring so that you can spend (waste) your time chasing that ghost (cosmic ray)? Probably not. So there are certain categories of anomalous, infrequent, handled, and inconsequential events that don't even raise any system eyebrows, as it were. I don't know exactly what they are, it probably varies by model, and IBM might not even tell you. And there's tremendous design sense in that approach, too, because invoking some sort of notification logic for inconsequential events could, all by itself, cause consequential errors. There's a lot of care and long-term field experience that goes into making these design decisions, as I understand it. The basic analogy here is that you shouldn't yell "Fire!" in a crowded theater (or even an uncrowded one) unless there really could be a fire, because the very act of yelling "Fire!" could cause more harm than good. The only currently marketed (as I write this) IBM Z or LinuxONE machine models that can be (but certainly don't have to be) ordered and configured without spare main processor cores are the IBM z13s (2965-N10 only) and the IBM LinuxONE Rockhopper (2965-L10 only). However, every uncharacterized core is a spare. You can order a single machine with 169 spare cores if you wish. To do that you'd order an IBM z14 or IBM LinuxONE Emperor II with one characterized core and 169 physically present but uncharacterized cores. That's probably not a configuration you should order, but you can if you insist. -------------------------------------------------------------------------------------------------------- Timothy Sipples IT Architect Executive, Industry Solutions, IBM Z & LinuxONE E-Mail: sipp...@sg.ibm.com ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN