Re: Scary Sysprogs and educating those 'kids'....

Ed Gould Tue, 07 Jan 2014 21:06:19 -0800

David:

This goes back to the late 70's. We had a 168MP that afterinstallation got intermittent S0C4's. It seemed related to load ofthe system.

The CE diagnostic's couldn't find a thing.

One of our sysprogs wrote a reasonably S0C4 proof program (It did aload and delete of IEF21WSD repeatedly) so IBM considered itreasonable proof and the skies darkened as they say with IBM typesand the found after a day that one of the tri leads was to long.(sorry don't remember how much to long was ) but the signal to thehigh speed buffer wasn't right.

Damnedest hardware bug I ever saw.


Ed

On Jan 7, 2014, at 7:59 PM, David Crayford wrote:

On 8/01/2014 12:05 AM, Scott Ford wrote:
I agree with Joel. PC based platforms in my experience has beenvery hardware error prone, maybe due to the components. Like Joel,I haven't seen a hardware failure in the Z/OS world since the 70s.
I've seen quite a few hardware failures on mainframes, they happenquite frequently. They almost never cause an outage because thereis redundancy. Most of the time we didn't even know we had afailure until IBM contacted us to let us know they had dispatchedan engineer. Almost all enterprise systems are the same, even x86.They have n+1 redundancy for hardware components and clustering forHA. Your friendly IBM salesman will be only too happy to talk toyou about an x86 high availability hardware/software platform.
Of course, the data center behemoths like google, facebook, amazonet all choose to buy the cheapest bare metal commodity componentswith redundancy done by the software. At that scale it's the onlymodel that makes economic sense.
Scott ford
www.identityforge.com
from my IPAD
On Jan 7, 2014, at 9:59 AM, David Crayford <[email protected]>wrote:
On 07/01/2014, at 6:57 AM, "Joel C. Ewing" <[email protected]> wrote:
The first step to successful diagnosing and repair of a softwarefailureis to be certain it IS a software issue and not some randomhardware
glitch.  This is made more difficult in the Intel world by the very
thing that makes these platforms affordable - a multitude of
manufacturers of motherboards, memory, hardware interface cards and
peripherals all applying their own concept of "acceptable"engineering
design while trying to make fast and cheap hardware.
Is that still the case today? Even cheap x86 blades have machinecheck architecture which can signal software on hardwarefailures. It must be over a decade or so since IBM startedstuffing mainframe quality RAM modules into x86 servers, chipkilletc. 90% of server failures were due to RAM errors. You don'thave to search too far to find 99.999 platforms running Intel.You'll pay for it though.----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN


----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Re: Scary Sysprogs and educating those 'kids'....

Reply via email to