[email protected] (David Crayford) writes: > Is that still the case today? Even cheap x86 blades have machine check > architecture which can signal software on hardware failures. It must > be over a decade or so since IBM started stuffing mainframe quality > RAM modules into x86 servers, chipkill etc. 90% of server failures > were due to RAM errors. You don't have to search too far to find > 99.999 platforms running Intel. You'll pay for it though.
Jim had study at Tandem that hardware failure by that time had drastically reduced and most failures had shifted to other factors (software, environmental, human mistakes) ... old overview from that study http://www.garlic.com/~lynn/grayft84.pdf disclaimer: I worked with Jim at IBM Research (before he left for Tandem) during System/R days (precursor to DB2) commodity disk mtbf use to be 80,000 hrs ... then it increased to 800,000 hrs and now nearly doubled to 1.4m hrs (that is w/o RAID technologies to mask failures). recent post http://www.garlic.com/~lynn/2013o.html#7 Something to Think About - Optimal PDS Blocking At IBM, we had done high-availability cluster systems with commodity parts and five-nines availability for HA/CMP ... some past posts http://www.garlic.com/~lynn/subtopic.html#hacmp somewhat as a result, I was asked to write a section for the corporate continuous availability strategy document ... however it got pulled when both Rochester (as/400) and POK (mainframe) complained that they couldn't meet the numbers. some past posts http://www.garlic.com/~lynn/submain.html#available in one scenario for 1-800 system ... we were up against hardware fault-tolerant system for five-nines availability. It turns out that at system level ha/cmp met the objective ... but the hardware fault-tolerant system needed scheduled downtime once a year for software maintenance ... which blew a century of downtime allowance. They came back with a cluster solution of replicated systems ... to mask the outage for software maintenance ... but that then negated the need to have the expensive hardware fault tolerant implementation. as mentioned in the commodity disk references, the large cloud megadatacenters have done extensive studies on price/availability ... part of the strategy (as well as HA/CMP) is akin to disk raid ... but applied to the rest of the infrastructure. slight topic drift ... Why Programmers Work At Night http://www.businessinsider.com/why-programmers-work-at-night-2013-1 and old post with "Real Programmers Don't Eat Quiche" http://www.garlic.com/~lynn/2001e.html#31 Real Programmers never work 9 to 5. If any Real Programmers are around at 9 AM, it's because they were up all night. ... this was back in the days before computer screens and typewriter computer terminals and working evenings was so you could concentrate and not be interrupted ... trying to solve very complex issues would require intense uninterrupted concentration. this is somewhat with individuals that crave constant interaction and long, unproductive meetings. I mention my first student programming job was porting 1401 MPIO to 360/30. The univ. had 709/1401 combo ... with 1401 handling front-end tape<->printer/punch/card reader ... and 709 ibsys running tape->tape and manual moving tape between 709 and 1401 tape drives. The univ. had been sold 360/67 (for tss/360) to replace 709/1401 and 360/30 replaced the 1401 during transition. The 360/30 had 1401 hardware emulation that ran MPIO just fine ... so my job redoing MPIO for 360/30 could be considered just getting familiarity with 360. However, I got to design and implement my own monitor, device drivers, interrupt handlers, error recovery, scheduling, dispatching, console interface, storage management. The datacenter shutdown at 8am sat and I got the whole room to myself from 8am sat to 8am that summer. This continued into the school year ... although it made Monday morning classes a little hard, having been up (w/o sleep) for 48hrs. It wasn't just night ... it was 48hrs of total uninterrupted concentration. Eventually the 360/67 was installed, but tss/360 made it to production quality, so the machine ran mostly as 360/65 running os/360 ... and I was hired fulltime to be system support. One of the issues was that student fortran jobs ran less than second elapsed time on 709 (ibsys tape->tape) ... but ran over a minute elapsed time on 360/65. This was reduced to under a minute with the addition of HASP. I started doing careful stage2 sysgens to optimize disk arm movement and pds member search ... taking the stage2 output of stage1 and reodering all the cards to careful place datasets and pds members on disk. This increased student fortran throughput by nearly another factor of three times (os/360 was enormously heavy disk access including dragging large numbers of multi-load transient SVCs 2kbytes at a time. Student fortran job throughput never did beat 709 until WATFOR was installed ... recent post http://www.garlic.com/~lynn/2013o.html#54 Curiosity: TCB mapping macro name - why IKJTCB? -- virtualization experience starting Jan1968, online at home since Mar1970 ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN
