[EMAIL PROTECTED] (Alex Martelli) writes: > Yeah, good question indeed, and I'm asking myself that -- somebody who > posts to this group in order to attack the reliability of the language > the group is about (and appears to be supremely ignorant about its use > in air-traffic control and for high-reliability mission-critical > applications such as Google's "Production Systems" software) might well > be considered not worth responding to. OTOH, you _did_ irritate me > enough that I feel happier for venting in response;-)
Hi Alex, I'm a little confused: does Production Systems mean stuff like the Google search engine, which (as you described further up in your message) achieves its reliability at least partly by massive redundancy and failover when something breaks? In that case why is it so important that the software be highly reliable? Is a software fault really worse than a hardware fault, especially if it's permissible to sometimes let a transaction (like a search query) go uncompleted (e.g. by displaying a "try again later" message)? If you get 1 billion queries in a month and a half dozen of them don't complete (e.g. they give empty or incorrect results when there are some good hits they should display) but the server is never actually down, can you still claim 100% uptime? There's a philosophy in Erlang described as "let it crash", i.e. programmers are told NOT to program defensively such as by checking inputs for validity. Instead they should just rely on the fault tolerance and process restart stuff to get things going again if their process fails. Similarly if the Google search software hits some fatal condition once in a while, maybe it's enough to just treat it as a crashed box and let the failover mechanisms handle the problem. Of course then there's a second level system to manage the restarts that has to be very reliable, but it doesn't have to deal with much weird concocted input the way that a public-facing internet application has to. Therefore I think Russ's point stands, that we're talking about a different sort of reliability in these highly redundant systems, than in the systems Russ is describing. -- http://mail.python.org/mailman/listinfo/python-list