On Mon, Mar 4, 2013 at 10:40 AM, Saku Ytti <s...@ytti.fi> wrote: > On (2013-03-04 13:23 -0500), Jeff Wheeler wrote: > >> We have lots of stupid people in our industry because so few >> understand "The Way Things Work." > > We have tendency to view mistakes we do as unavoidable human errors and > mistakes other people do as avoidable stupidity. > > We should actively plan for mistakes/errors, if you actively plan for no > 'stupid mistakes', you're gonna have bad time > > From my point of view, outages are caused by: > 1) operator > 2) software defect > 3) hardware defect > > Most people design only against 3), often with design which actually > increases likelihood of 2) and 1), reducing overall MTBF on design which > strictly theoretically increases it.
...And a lot of people who know the heirarchy solve 3 and then solve 2 in a way that increases 1 (multiple parallel environments with different vendors' equipment) only to find that 1 increased, due to additional complexity. On the other hand, I've seen people who had horrible explosions of 2 or 3 due to ignoring all but 1. If you ACTUALLY need that many 9s, you need all of redundancy, diversity of vendors, and suitably trained, exercised, process-supported net admins. That's a few multiples of 2 more expense than nearly anyone typically wants to pay for. -- -george william herbert george.herb...@gmail.com