On 7/9/2013 10:28 PM, Erik Levinson wrote:
As some may know, yesterday 151 Front St suffered a cooling failure
after Enwave's facilities were flooded.
One of the suites that we're in recovered quickly but the other took
much longer and some of our gear shutdown automatically due to
overheating. We shut down remotely many redundant and non-essential
systems in the hotter suite, and transferred remotely some others to
the cooler suite, to ensure that we had a minimum of all core systems
running in the hotter suite. We waited until the temperatures
returned to normal, and brought everything back online. The entire
event lasted from approx 18:45 until 01:15. Apparently ambient
temperature was above 43 degrees Celcius at one point on the cool
side of cabinets in the hotter suite.
For those who have gone through such events in the past, what can one
expect in terms of long-term impact...should we expect some premature
component failures? Does anyone have any stats to share?
No stats, but way back in the day of very large computers (1 each) in
very large facilities, it seems like the thing we worried most about at
restart was too-rapid cooling and the resulting condensation if the
conditions were right.
After power-up the next thing was disk crashes that occurred on the way
down (this was a long time ago discs and drums are different now).
Lastly was overheat failures which were relatively few and always in
components with a weakness reputation.
Requiescas in pace o email Two identifying characteristics
of System Administrators:
Ex turpi causa non oritur actio Infallibility, and the ability to
learn from their mistakes.
(Adapted from Stephen Pinker)