Re: EPO's (Emergency Power Off)

Joel C. Ewing Fri, 03 Dec 2010 18:57:16 -0800

Our first UPS many years ago was only able to power all equipment in thecomputer room and the building chilled water pumps for 15 minutes andhad no generator backup. This was a big improvement over no UPS,because 99% of our utility glitches at the time were at most a fewseconds, and if they ever lasted more than 2 minutes you could be prettysure they would longer than the UPS battery power. We had documentedprocedures and automation in place with NETVIEW and the freebieNETINIT/NETSTOP CBT tool for mainframe system shut down in under 10minutes, and if the power still wasn't back by then, to proceed to powerdown the mainframe and various other equipment. There isn't anydocumented procedure for orderly non-mainframe server shut down otherthan to locate all servers still making noise or light, locate powerbutton, and press.

The same basic procedures were used in the event of loss of cooling tothe computer room, although typically there was more time to react aslong as the chilled water was still flowing. But with the old watercooled beasts, if the building chilled water pumps also failed, therewas at best a minute or two - insufficient time to attempt orderlysystem shut down. In those cases just stopping the processor followedby a power down was always deemed preferable than testing whether theprocessor's thermal cut off would prevent damage.

Those procedures and the system shut down automation were kept in placeeven after we eventually got emergency generator capability capable ofsupplying the entire building because there is still the possibility ofenvironmental system failure or a failure in bringing the generator online. The same automation that was originally created for emergencyshut down is used regularly for shut down for scheduled IPL's. We donot test or train for actual hardware power down, so this now only getstested on very rare occasions when disruptive maintenance must be doneon the building power or environmental systems.


Power-up procedures are also documented, but again seldom tested.

Someone from Tech Services is always on call and would be involved if apower down or power up is ever required. Tech Services is also involvedin physical planning and installation of all hardware in computer roomand in a good position to know how to handle anything too recent to havemade it to the formal power-down, power-up documentation.

   Joel C Ewing

On 12/03/2010 11:45 AM, Darth Keller wrote:

An interesting question came up this morning -  all your multiple power
sources have just failed.  Your generator(s) started but, for whatever
reason, have also failed.  You're now on battery power and have 23 minutes
to power everything off as gracefully as possible.  Do you have procedures
in place to do it?

I don't even want to think about all the open-systems stuff, my head would
explode.  But I don't think this is a trivial exercise even from the
mainframe side.   I'm thinking you almost have to think about this in the
same way you would approach planning a DR event.  Maybe you have a couple
of scenarios -

1.  I know I'm going to lose power in 3 hours.

2. I know I've only got 23 minutes&  the clock is already running.

Do you get as much of the software shut off as possible&  just let the
hardware take care of itself?

Do other companies have plans in place for this?  Is it reviewed with some
frequency?  Pretty hard to test unless you have your own DR site, but do
you at least do periodic walk-through's as an exercise?   Do you have or
need a procedure on how to restart after an EPO of any duration?


Have I found a new career path or should I just ask for my medications  to
be adjusted?

...

--
Joel C. Ewing, Fort Smith, AR        [email protected]

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Re: EPO's (Emergency Power Off)

Reply via email to