[email protected] (David Crayford) writes:
> Is that still the case today? Even cheap x86 blades have machine check
> architecture which can signal software on hardware failures. It must
> be over a decade or so since IBM started stuffing mainframe quality
> RAM modules into x86 servers, chipkill etc. 90% of server failures
> were due to RAM errors. You don't have to search too far to find
> 99.999 platforms running Intel. You'll pay for it though.

Jim had study at Tandem that hardware failure by that time had
drastically reduced and most failures had shifted to other factors
(software, environmental, human mistakes) ... old overview from
that study
http://www.garlic.com/~lynn/grayft84.pdf

disclaimer: I worked with Jim at IBM Research (before he left for
Tandem) during System/R days (precursor to DB2)

commodity disk mtbf use to be 80,000 hrs ... then it increased to
800,000 hrs and now nearly doubled to 1.4m hrs (that is w/o RAID
technologies to mask failures). recent post
http://www.garlic.com/~lynn/2013o.html#7 Something to Think About - Optimal PDS 
Blocking

At IBM, we had done high-availability cluster systems with commodity
parts and five-nines availability for HA/CMP ... some past posts
http://www.garlic.com/~lynn/subtopic.html#hacmp

somewhat as a result, I was asked to write a section for the corporate
continuous availability strategy document ... however it got pulled when
both Rochester (as/400) and POK (mainframe) complained that they
couldn't meet the numbers. some past posts
http://www.garlic.com/~lynn/submain.html#available

in one scenario for 1-800 system ... we were up against hardware
fault-tolerant system for five-nines availability. It turns out that at
system level ha/cmp met the objective ... but the hardware
fault-tolerant system needed scheduled downtime once a year for software
maintenance ... which blew a century of downtime allowance. They came
back with a cluster solution of replicated systems ... to mask the
outage for software maintenance ... but that then negated the need to
have the expensive hardware fault tolerant implementation.

as mentioned in the commodity disk references, the large cloud
megadatacenters have done extensive studies on price/availability
... part of the strategy (as well as HA/CMP) is akin to disk raid
... but applied to the rest of the infrastructure.

slight topic drift ... Why Programmers Work At Night
http://www.businessinsider.com/why-programmers-work-at-night-2013-1

and old post with "Real Programmers Don't Eat Quiche"
http://www.garlic.com/~lynn/2001e.html#31 

Real Programmers never work 9 to 5. If any Real Programmers are around
at 9 AM, it's because they were up all night. 

... this was back in the days before computer screens and typewriter
computer terminals and working evenings was so you could concentrate and
not be interrupted ... trying to solve very complex issues would require
intense uninterrupted concentration. this is somewhat with individuals
that crave constant interaction and long, unproductive meetings.

I mention my first student programming job was porting 1401 MPIO to
360/30. The univ. had 709/1401 combo ... with 1401 handling front-end
tape<->printer/punch/card reader ... and 709 ibsys running tape->tape
and manual moving tape between 709 and 1401 tape drives.

The univ. had been sold 360/67 (for tss/360) to replace 709/1401 and
360/30 replaced the 1401 during transition. The 360/30 had 1401 hardware
emulation that ran MPIO just fine ... so my job redoing MPIO for 360/30
could be considered just getting familiarity with 360. However, I got to
design and implement my own monitor, device drivers, interrupt handlers,
error recovery, scheduling, dispatching, console interface, storage
management.

The datacenter shutdown at 8am sat and I got the whole room to myself
from 8am sat to 8am that summer. This continued into the school year
... although it made Monday morning classes a little hard, having been
up (w/o sleep) for 48hrs. It wasn't just night ... it was 48hrs of total
uninterrupted concentration.

Eventually the 360/67 was installed, but tss/360 made it to production
quality, so the machine ran mostly as 360/65 running os/360 ... and I
was hired fulltime to be system support.

One of the issues was that student fortran jobs ran less than second
elapsed time on 709 (ibsys tape->tape) ... but ran over a minute elapsed
time on 360/65. This was reduced to under a minute with the addition of
HASP. I started doing careful stage2 sysgens to optimize disk arm
movement and pds member search ... taking the stage2 output of stage1
and reodering all the cards to careful place datasets and pds members on
disk. This increased student fortran throughput by nearly another factor
of three times (os/360 was enormously heavy disk access including
dragging large numbers of multi-load transient SVCs 2kbytes at a time.

Student fortran job throughput never did beat 709 until WATFOR was
installed ... recent post
http://www.garlic.com/~lynn/2013o.html#54 Curiosity: TCB mapping macro name - 
why IKJTCB?

-- 
virtualization experience starting Jan1968, online at home since Mar1970

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to