> Actually, it was a very complex power outage. I'm going to assume that what 
> happened this weekend was similar to the event that happened at the same 
> facility approximately two weeks ago (its immaterial - the details are 
> probably different, but it illustrates the complexity of a data center 
> failure)
>
> Utility Power Failed
> First Backup Generator Failed (shut down due to a faulty fan)
> Second Backup Generator Failed (breaker coordination problem resulting in 
> faulty trip of a breaker)
>
> In this case, it was clearly a cascading failure, although only limited in 
> scope. The failure in this case, also clearly involved people. There was one 
> material failure (the fan), but the system should have been resilient enough 
> to deal with it. The system should also have been resilient enough to deal 
> with the breaker coordination issue (which should not have occurred), but was 
> not. Data centers are not commodities. There is a way to engineer these 
> facilities to be much more resilient. Not everyone's business model supports 
> it.

ok, i give in.  as some level of granularity everything is a cascading
failure (since molecules colide and the world is an infinite chain of
causation in which human free will is merely a myth </Spinoza>)

of course, this use of 'cascading' is vacuous and not useful anymore
since it applies to nearly every failure, but i'll go along with it.

from the perspective of a datacenter power engineer, this was a
cascading failure of a few small number of components.

from the perspective of every datacenter customer:  this was a power failure.

from the perspective of people watching B-rate movies:  this was a
failure to implement and test a reliable system for streaming those
movies in the face of a power outage at one facility.

from the perspective of nanog mailing list readers:  this was an
interesting opportunity to speculate about failures about which we
have no data (as usual!).

can we all agree on those facts?

:-)

t

Reply via email to