Tony Abernethy wrote:
Geoff Steckel wrote:>>
And yes, error recovery is a very significant part of any non-trivial
useful program which does (for instance) network I/O, because the
universe of possible errors is large.
Error recover?
Does anyone ever debug error recover?
Is there any way anyone **COULD** debug error recovery?
on order of magnitude more complicated and no tools --- predictable.
Maybe I'm overly pessimistic, but if so, (try to) prove me wrong.
In the general case you're absolutely correct.
If you separate errors into
local environment problems: sudden memory shortage,
disk I/O error, hardware errors
Usually these can only be dealt with by exiting
as cleanly as possible as quickly as possible
internal program error/inconsistencies
Exit cleanly after recording as much as possible
about the input provoking the problem
These are (almost) always symptoms of bugs.
input data malformed, inconsistent, or missing
peer or server failure (no communication or bad communication)
This is the area which can be dealt with.
A deceptively simple strategy handles the last case.
there must be only one point in the entire program which
can block waiting for input from the outside world.
that point must have an analysis function which
decodes the message, and a scheduler functionality which
vectors depending on the decoded message and explicit
current state
all functionality dispatches from this point and
returns to this point without blocking for anything
other than local disk I/O and returning a "what was
done" code which the scheduler uses to compute the
next state. Errors detected are reflected in the
"what was done" code.
setting explicit error states as needed
all errors have explicit new states from an explicit
state transition table - NOT from random if-then-elses
all input and output messages can be traced for
debugging
This -can- analyzed and (to a greater or lesser extent)
debugged. Some errors cannot be recovered from other than
by cleaning up the debris and exiting or starting over.
It is possible to be reasonably sure that the program will
never hang forever or loop forever (absent internal errors
of class 2 above).
It is isomorphic with a state driven parser. Indeed, some fairly
complex problems can be turned into explicit grammars.
Tools like yacc used to generate the required control mechanisms.
A great deal of code then doesn't need to be written.
Doing this requires an analysis of the problem far
deeper than most programmers will do or managers
wait for. Implementing requires a discipline most
programmers will not endure.
It works, and it works well, and it can be checked
by peer review and debug traces, and fixes can be
tested with message replay. Programs built this way
tend not to need repair and repairs tend to be simple,
if not necessarily easy.
geoff steckel