> Subject: RE: Amazon diagnosis > Date: Sun, 1 May 2011 12:50:37 -0700 > From: George Bonser <gbon...@seven.com> > > They apparently had a redundant primary network and, on top of that, a > secondary network. The secondary network, however, did not have the > capacity of the primary network. > > Rather than failing over from the active portion of the primary network > to the standby portion of the primary network, they inadvertently failed > the entire primary network to the secondary. This resulted in the > secondary network reaching saturation and becoming unusable. > > There isn't anything that can be done to mitigate against human error. > You can TRY, but as history shows us, it all boils down the human that > implements the procedure. All the redundancy in the world will not do > you an iota of good if someone explicitly does the wrong thing. ... > > This looks like it was a procedural error and not an architectural > problem.
A sage sayeth sooth: "For any 'fool-proof' system, there exists a *sufficiently*determied* fool capable of breaking it." It would seem that the validity of that has just been re-confirmed. <wry grin> It is worthy of note that it is considerably harder to protect against accidental stupidity than it is to protect againt intentional malice. ('malice' is _much_ more predictable, in general. <wry grin>)