On Fri, 31 Aug 2007 16:06:36 +0800, Ian Kent said: > So, there's a power outage and the UPS had a glitch.
Murphy can get a *lot* more creative than that. So we'd outgrown the capacity on our UPS and diesel generator, and decided to replace them. So we schedule downtime for a Saturday. Rather scary, we had a Sun E10K that had been powered-up for several years, and just as expected, a good fraction of the 400+ drives it had failed to re-spinup. While recovering from that, we discovered that although the vast majority of the 400 drives were either mirrors or raidsets, due to a config error, the boot volume wasn't mirrored (fortunately, it spun up OK so we dodged the bullet), so we fixed that. Literally the next Friday, not even a week later, a contractor relocating a door into our machine room shorted out a sensor circuit in our fire suppression system, triggering a Halon dump. Of course, no amount of UPS and diesel was going to save us now, because there was a safety interlock that killed the power feeds if the Halon dumped. This time, since they'd all been stressed just a week before, only 2 of the 400+ disks on the E10K failed to spin up. Guess which two. ;)
pgpQoTkLIZTYK.pgp
Description: PGP signature