This and other RFCs are available on the web at http://dev.perl.org/rfc/ =head1 TITLE object neutral error handling via exceptions =head1 VERSION Maintainer: Glenn Linderman <[EMAIL PROTECTED]> Date: 16 Aug 2000 Last Modified: 22 Aug 2000 Version: 2 Mailing List: [EMAIL PROTECTED] Number: 119 =head1 ABSTRACT Revisit what the goals of error handling and exceptions are for, to determine the set of desirable unit operations, rather than start with a bundle of stuff from another language, and try to make it Perlish. =head1 CHANGES Addition of "always" clause. Clarified the rules for catching to specify that catch statements prior to the current point of execution in a given scope are not candidate targets for throws, only catch statements after the current point of execution in a given scope. Clarified that when a catch statement is encountered in the normal flow, that it is not executed. Added a section discussing the use of this "always-on" mechanism by people that prefer error returns. Added a few notes regarding implementation. Added a section regarding differences between RFC88v2d5 and RFC119 =head1 DESCRIPTION There are numerous RFCs regarding a complete bundle of exception handling mechanisms. Most of them are modeled after some other languages exception handling mechanism, adapted somewhat to Perl, and somewhat to the goals of the author. While this is not all bad, as the problems being faced were faced in the other languages as well, it is not necessarily all good, either. This RFC examines some of the incentives behind C++ exceptions, both the structure of the code and the structure of the exception object, then examines the goals of an exception mechanism, then examines some techniques that could be used to reach the goals. The result can be made to look a lot like the C++ exception mechanism if desired, but can be much more powerful when all its features are used. So this leads to the following "head2"s: - C++ exceptions - Goals of exceptions - Techniques for exceptions - Results - C++-like Usage - Conversion wrappers I focus on C++ rather than Java, because Java (pardon me, Java-heads) is just an attempt to use the best parts of C++ without all the baggage of C, so while most of this could have been changed in Java, it wasn't. This made Java easy to learn for C users who'd read about C++, and for C++ users. This didn't make Java a significantly better language than C++, although they were able to remove some of the worst C++ language traps. To excel, you need to not only remove the worst, but add some best. I think that's a goal of Perl. While Graham's error.pm module is a valiant attempt to include C++-like exception handling in Perl, it has various deficiencies (discussed by others) that can be attributed to be an add-on to Perl, just like C++ exception handling has various deficiencies because of being an add-on to C. =head2 C++ exceptions Remember that C++ exceptions were built first as a preprocessor for C. Therefore, the mechanisms used had to exist in C. Stack unwinding could therefore only be done by using the only non-local goto facility supported by C: longjmp. This forced a number of decisions about the design of exceptions, not all of which are good. I note in passing that ANSI Forth defines catch and throw which use single cells as parameters... so not all usage of catch and throw is related to object techniques. =head3 Keyword try First, longjmp doesn't work without a setjmp, and setjmp must be called prior to longjmp. This is the basic justification for try: it calls setjmp at a point within the scope of the code for which the exception handling mechanism is to be activated. Some attempts have been made to justify the use of the try keyword as aiding programmer comprehension of the scope of the try block, and perhaps it does this in languages where some code may be in the scope of the exception handling mechanism, and other code may not be. It would seem, however, that the best implementation of an exception handling mechanism would be that all code is in scope of the exception handling mechanism, so that exceptions cannot be ignored, other than explicitly. Perl's die is of that flavor: very hard to ignore, except explicitly. =head3 Scoping problems Let's presume the example often cited, of wishing to close a file handle during the unwind, here's some C++ for that: FILE *handle; try { handle = fopen ( ... ); ... } catch ( ... ) { fclose ( handle ); throw; } Note that "handle" has to be defined outside the scope of the try, because catch cannot see the scope defined by the try block, and is completely unable to recover from problems that are not explicitly hoisted outside of the try block. =head3 Control flow problem #1 Here's some icky C error handling code: 3 errors to handle so the pattern becomes obvious, but I tried to keep them simple (all open errors to build on the case above)--they can get much more complex in practice, when the code to deal with an error gets more complex. int returned_error; FILE *handle1, *handle2, *handle3; if ( ! handle1 = fopen ( ... )) { return errno; } if ( ! handle2 = fopen ( ... )) { returned_error = errno; close ( handle1 ); return returned_error; } if ( ! handle3 = fopen ( ... )) { returned_error = errno; close ( handle1 ); close ( handle2 ); return returned_error; } ... Here's an icky attempt to reduce the redundant error code: int returned_error; FILE *handle1, *handle2, *handle3; if ( ! handle1 = fopen ( ... )) { returned_error = errno; handle1_return: return returned_error; } if ( ! handle2 = fopen ( ... )) { returned_error = errno; handle2_return: close ( handle1 ); goto handle1_return; } if ( ! handle3 = fopen ( ... )) { returned_error = errno; handle3_return: close ( handle2 ); goto handle2_return; } ... While this solves the redundancies of the cleanup code, the cleanup code for handle1 is by the code that attempts to open handle2, rather than being bundled with the code that opens handle1. C never claimed to be OO, but even without OO, this is icky. Translating to C++ doesn't help much. Assuming (not accurately) that C++'s fopen throws an error if it fails, to simulate proposals that Perl's open should do exactly that: FILE *handle1 = NULL, *handle2 = NULL, *handle3 = NULL; try { handle1 = fopen ( ... ); handle2 = fopen ( ... ); handle3 = fopen ( ... ); ... } catch ( ... ) { if ( handle1 ) close ( handle1 ); if ( handle2 ) close ( handle2 ); if ( handle3 ) close ( handle3 ); throw; } Some would like this, because it removes all the error handling code from the control flow, but to support that, the handles must be outside the try block (as noted in the previous section) so they can be seen by the catch block, they must be initialized even if never used (not a bad programming practice, but certainly not needed in the C examples) so that the catch block doesn't do stupid things, and the code to clean up a handle is far removed from the code that sets up the handle. Perl, fortunately, initializes all its variables to undef, so we are saved from that aspect of C/C++. =head3 Control flow problem #2 The above examples all dealt with cases where the error is simply rethrown, using the "catch ( ... )" as a "finally" block per RFC 88, or a "continue" block per RFC 63. When actually attempting to handle errors, we discover that any commonality between handling different errors results in duplicate code (or additional subroutines or gotos): FILE *handle1 = NULL, *handle2 = NULL, *handle3 = NULL; try { handle1 = fopen ( ... ); handle2 = fopen ( ... ); handle3 = fopen ( ... ); ... } catch ( error_type_1 ) { if ( handle1 ) close ( handle1 ); if ( handle2 ) close ( handle2 ); if ( handle3 ) close ( handle3 ); // ... report error type 1, handle it } catch ( error_type_2 ) { if ( handle1 ) close ( handle1 ); if ( handle2 ) close ( handle2 ); if ( handle3 ) close ( handle3 ); // ... report error type 2, handle it } catch ( ... ) { if ( handle1 ) close ( handle1 ); if ( handle2 ) close ( handle2 ); if ( handle3 ) close ( handle3 ); throw; } Or you could: void help_clean ( FILE * handle1, FILE * handle2, FILE * handle3 ) { if ( handle1 ) close ( handle1 ); if ( handle2 ) close ( handle2 ); if ( handle3 ) close ( handle3 ); } FILE *handle1 = NULL, *handle2 = NULL, *handle3 = NULL; try { handle1 = fopen ( ... ); handle2 = fopen ( ... ); handle3 = fopen ( ... ); ... } catch ( error_type_1 ) { help_clean ( handle1, handle2, handle3 ); // ... report error type 1, handle it } catch ( error_type_2 ) { help_clean ( handle1, handle2, handle3 ); // ... report error type 2, handle it } catch ( ... ) { help_clean ( handle1, handle2, handle3 ); throw; } This removes the error handling code even further from the setup code, still requires redundancy among the catch phrases, and introduces new functions dealing only with cleanup. Assuming an RFC 88 finally clause added to C++ would help, if and only if and only if (if I understand it correctly) the handles can be closed _at the end_ of the cleanup process. That would produce: FILE *handle1 = NULL, *handle2 = NULL, *handle3 = NULL; try { handle1 = fopen ( ... ); handle2 = fopen ( ... ); handle3 = fopen ( ... ); ... } catch ( error_type_1 ) { // ... report error type 1, handle it } catch ( error_type_2 ) { // ... report error type 2, handle it } catch ( ... ) { throw; } finally { if ( handle1 ) close ( handle1 ); if ( handle2 ) close ( handle2 ); if ( handle3 ) close ( handle3 ); } I'm not sure how the "catch ( ... )"'s rethrow would interact with the finally clause, that seems to be an area of discussion regarding the differences between RFC 63 and RFC 88. =head2 Goals of exceptions This is my list so far, feel free to suggest more. In the examples thus far, each "fopen" call could independently fail, but the overall program appears to need to open all three, or none, in a somewhat atomic manner. While the code to deal with a single fopen call and the possibility that it fails is straightforward, the complexity of the situation results from the polynomial explosion of code and branches resulting from increasing numbers of operations. This is my justification for the first 6 items on the list. While I have nothing against OO techniques (I've found C++ OO features useful for a compiled language), it is somewhat cumbersome to deal with OO for small projects. Perhaps some of the "make everything an object" RFCs for Perl6 will sidestep that cumbersomeness, and moot this point. However, until or unless that is achieved, I'd rather not be forced to use objects to achieve exception handling. On the other hand, when building large system, having an exception object might be helpful. This is my justification for item 7. 1) Keep the cleanup code near the setup code, to keep it understandable 2) Keep the cleanup code in the same scope as the setup code, to avoid hoisting variables into higher scopes. 3) Avoid redundancy and complex control flow in the visible cleanup code paths. 4) Achieve a structured form of non-local goto to allow exiting multiple levels of subroutine calls without coding tests of error conditions at every level within the stack. 5) Achieve good default reporting of uncaught exceptions. 6) Make exception handling the default (or only) method of operation for Perl code 7) Permit use of exception objects, but don't require them. =head2 Techniques for exceptions =head3 Technique for goals 1-3 Add new except and always clauses that can modify a statement or a block: statement1 except statement2 always statement3; Any of statement1, statement2 or statement3 can be made into blocks, with the result that scoping problems resurface, but often times they wouldn't need to be blocks. If execution of the containing scope reaches statement1, it is executed as normal. Because it contains an always clause, statement3 is pushed on the stack of cleanup code to be executed when the scope exits, and because it contains an except clause, statement2 to be pushed on the stack of cleanup code to be executed if an exception occurs. There is logically only one stack of cleanup code, so the order of execution of the cleanup statements is always consistent, although some of it is conditional. For the statement above, statement2 would always be executed before statement3 if an exception occurs. Statement2 is omitted if no exception occurs. Statement 3 is omitted only if the scope exits prior to statement1 being executed. For example (I'll use Perl language examples henceforth): $handle1 = open ( "<file1" ) always close ( $handle1 ); throw "Error opening file1" if ! defined $handle1; $handle2 = open ( "<file2" ) always close ( $handle2 ); throw "Error opening file2" if ! defined $handle2; $handle3 = open ( "<file3" ) always close ( $handle3 ); throw "Error opening file3" if ! defined $handle3; If you assume that Perl6 open gets enhanced to throw an exception when it fails, you can simplify this to: $handle1 = open ( "<file1" ) always close ( $handle1 ); $handle2 = open ( "<file2" ) always close ( $handle2 ); $handle3 = open ( "<file3" ) always close ( $handle3 ); =head3 Technique for goal 4 Add a new throw clause to achieve a structured non-local goto. The throw statement takes a list as a parameter, and can be qualified with the usual conditionals. So you can throw "Error opening file1"; or (printf-like throw) throw "Error opening file %s", "file1"; or (OO throw) throw new Exception::Error ("Error opening file", "file1"); throw new Exception::Success ("The answer is", 17 ); or (rethrow) throw; # throws @_ Definition: OO throw: a throw that throws a single object reference parameter. Now a non-local goto has to have a target, so that is provided by the catch statement, which is a sub-like block. There are rules for finding the appropriate catch statement, listed later. A catch statement gets a new @_ which is initialized to the list supplied to the throw. These catch examples all use die to make the errors fatal, but if die is not used, execution would continue with the next non-catch statement after the executed catch statement. catch { die join ( ", ", @_ ); } or (printf-like catch) catch { my ( $msg, @parm ) = @_; die sprintf "$msg\n", @parm; } or (conditional catch) catch ( $_[0] =~ /^Error/ ) { die join ( ", ", @_ ); } or (simple OO catch) catch { die $_[0]->message; } or (complex OO catch) catch Exception::Error { die $_[0]->message; } catch Exception::Success ( print $_[0]->message; exit ( 0 ); } catch Exception { die "unexpected:" . sprintf $_[0]->message; } When a catch statement is encountered in the normal flow of control (falling from one statement to the next), it is not executed. Rather is is removed from the list of candidate catch statements that might be used as targets of a subsequent throw. What about the rules for determining which catch statement is the target of a particular throw? A combination of lexical and dynamic scope rules, which aren't that different from those for C++. Definition: appropriate catch statement case 1: for an OO throw, an appropriate catch statement is one that lists the class of the reference thrown. case 2: for other throws, an appropriate catch statement is one that doesn't list a class name, and either has no expression, or has an expression which is true when evaluated (with @_ referring to the parameters thrown). Catch selection rules: - If the scope containing the throw contains catch statements, they are examined in source code order to determine if any are appropriate. The first appropriate catch statement after the current execution point in the scope is used. - If rule1 doesn't yield an appropriate catch statement, all except and always clauses for that scope are logically popped off the cleanup stack and executed (in reverse order, that's why it is logically a stack). - Each lexically larger scope within the sub is examined in like manner, using the above two rules. There is one exception to this rule: if the throw is within the scope of a catch statement, the scope containing that catch statement is not examined to find a handler for such embedded throws. See example below. - If the sub contains no appropriate catch statement, the above three rules are used for each sub found on the call stack. - There are two implicit catch phrases at the end of the outer scope: catch UNIVERSAL { die "uncaught OO throw: $_[0]"; }; catch { die "uncaught throw: @_"; } Example for rule three's exception: { # some code, in which it, or something it calls, does a "throw 37" catch ( $_[0] == 37 ) { # the throw is caught here { down inside some nested scope or call, someone does throw 38; } # if there were a "catch ( $_[0] == 38 ) { ... }" here, it would # be allowed to catch the throw 38. } catch ( $_[0] == 38 ) { # this should not catch throws from within catch clauses at the # same lexical scope, because a catch at this lexical level is # already in progress. } } catch ( $_[0] == 38 ) { # but this one should catch it! } The whole point being that only one catch in a scope should ever execute until it is complete, and that errors within a catch statement or block should be caught within that statement or block, or be passed to outer scopes to be caught there. =head2 Results New statements throw & catch, with semantics similar to their counterparts in other languages. New clauses except and always which localize cleanup code near, and potentially in the same scope as, the corresponding setup code. Two new implicit catch phrases. Orthogonality with eval/die for compatibility. Support for both OO and structured programming. =head2 C++-like Usage When putting all the above features together, it is possible to construct syntax that looks very similar to the equivalent C++ syntax. This might seem familiar to some, as a way to ease into the more powerful Perlish syntax proposed here. An example along the lines of the earlier examples would be: // repeat of earlier C++ example try { handle1 = fopen ( ... ); handle2 = fopen ( ... ); handle3 = fopen ( ... ); ... } catch ( error_type_1 ) { // report error type 1, handle it if ( handle1 ) close ( handle1 ); if ( handle2 ) close ( handle2 ); if ( handle3 ) close ( handle3 ); } catch ( error_type_2 ) { // report error type 2, handle it if ( handle1 ) close ( handle1 ); if ( handle2 ) close ( handle2 ); if ( handle3 ) close ( handle3 ); } catch ( ... ) { if ( handle1 ) close ( handle1 ); if ( handle2 ) close ( handle2 ); if ( handle3 ) close ( handle3 ); throw; } Corresponding Perl example using some of the features of this RFC: # Note, try keyword is not needed, because exception handling is always # available. The "try" block is only needed to be C++-like in bundling # together all the "except" processing (which is better left # distributed IMHO). my ( $handle1, $handle2, $handle3 ); { $handle1 = open ( ... ); $handle2 = open ( ... ); $handle3 = open ( ... ); ... } always { close ( $handle1 ); close ( $handle2 ); close ( $handle3 ); } catch error_type_1 { # report error type 1, handle it } catch error_type_2 { # report error type 2, handle it } catch { throw; } It should be noted that the last catch phrase in both examples above could be omitted without semantic change in both C++ and using the facilities of this RFC. It is just shown to indicate how to write a phrase which catches everything else, and how to code an explicit rethrow of the same exception. This same example could be shortened as follows, using the complete power of the syntax in this RFC. my $handle1 = open ( ... ) always close ( $handle1 ); my $handle2 = open ( ... ) always close ( $handle2 ); my $handle3 = open ( ... ) always close ( $handle3 ); ... catch error_type_1 { # report error type 1, handle it } catch error_type_2 { # report error type 2, handle it } =head2 Conversion wrappers Perl5 doesn't have exceptions, except that $SIG{"__DIE__"} can be used as a primitive way to catch "die" statements. Hence, most extant code uses some form of error handling based on returning the success or failure of a called subroutine. This doesn't get broken by this RFC, neither does it get "fixed". Extant code must be modified to leverage the simpler interfaces, and powerful features of this RFC. For people that want to use a module or sub that uses the features of this RFC in the "old style" error checking manner, may I suggest the following wrapper: sub wrapAPI { API ( @_ ); catch { return $_[0]; } } For people that want to use a module or sub that doesn't use the features of this RFC but want their code to use such features, may I suggest the following wrapper: sub wrapAPI { if ( $error = API ( @_ )) { throw $error, "context information"; } } Clearly these wrappers would have to be customized to the extent that they might be context sensitive (hopefully RFC21, if implemented, gets extended to implement an appropriate way to invoke a called sub in the same or equivalent context as the caller sub, to ease the burden of writing such wrapper subs), and to the extent that errors are returned via techniques other than a scalar return value. But the point of the above wrapAPIs is to show a technique for converting exception based interfaces to and from non-exception based interfaces. =head1 IMPLEMENTATION Not much clue. Clearly this extends the amount of code that might be executed in an order other than the default linear order. Some thoughts on things that might be useful. At the beginning of a new scope, all the catch blocks in the scope could be unshifted onto a list of candidate catch blocks for possible throws. Then as each catch block is encountered during the normal execution order, it could be shifted off, because it is no longer a candidate. However, this is probably inefficient, because unless a throw occurs, it is work in vain. A more efficient implementation would probably search a per-scope list of catch blocks for candidates, with the relative position of the catch block being one of the parameters to the search algorithm. The except and always clauses are described as a stack, and could possibly be implemented that way. While that is a simple way to conceptualize them, it is probably more efficient to regroup these clauses for each scope (causing the generated code to be in a different order than the source code), and place the clauses in reverse order. Then execution of those clauses could be almost normal, but the point of entry to those regrouped clauses would vary based on the point in the code at which an exception occurs, or a branch or return is encountered that leaves the scope. =head1 Differences between RFC88v2d5 and RFC 119 While this comparison is to RFC88v2d5, references will be made simply as RFC 88, for simplicity. =head2 What gets thrown RFC 88 specifies that an exception object always exists, and that it is of some particular type. Should anything else be thrown, it gets stringified and wrapped into an exception object. RFC 119 doesn't make such a specification, but a similar technique could be used to implement and extend RFC 119. However, there are a couple gotchas: RFC 119 wants to make available to the catch block exactly the same list of parameters supplied to throw. This is prevented by RFC 88's stringification and concatenation of parameters. Were RFC 88 extended such that its message parameter could be a list reference, then it could store the actual parameters to the throw clause. The stringification and concatenation could be deferred until the user requests such a stringification by calling some API to obtain "the message". This would allow the original parameters to throw to be available to the catch block. RFC 119 makes those parameters available to the catch block via @_, whereas RFC 88 doesn't presently make the parameters available. Earlier versions of RFC 88 used @_ for other purposes, but at least in draft 5 @_ is not used. A possible way to reconcile RFC 88 with RFC 119 would be to extend RFC 88 to set @_ to the saved list of parameters from throw (presuming the extension in the prior paragraph as well). =head2 Control flow: except, finally, and always RFC 88 uses the finally keyword as a subclause introducer for the try statement. RFC 119 uses the except and always keywords as subclause introducers for any statement. The basic purpose of the keywords is the same: to specify code that will get executed if the scope is exited due to an exception. However, there are some differences, beyond the type of statement to which they can be attached. RFC 119's except clause is executed only if an exception occurs, which exits the scope containing the statement in which it is defined. RFC 88's finally clause is executed if an exception occurs, or if flow of control falls out of the bottom of the try statement. It is not clear whether the finally clause is executed if the try statement is exited via a goto or return, but the statement is made that once a try statement is entered, it is guaranteed that the finally clause is entered, but none of the discussion mentions exiting via goto or return. RFC 119's always clause is executed if an exception occurs, or if flow of control exits the scope in any other manner. If RFC 88's finally clause is executed when the try statement is exited via goto or return, then RFC 119 would be happy to rename its "always" clause to "finally", and the techniques would be more uniformly named. RFC 119 presently limits each statement to a single always clause; RFC 88 permits multiple finally clauses for a single try statement. The explanation about why multiple finally clauses are needed points out the benefit to specifying such logic closer to the point of the setup code. RFC 88 has nothing that directly corresponds to RFC 119's except clause, although that logic could be included at the end of each catch clause. It is probably possible to code equivalent exception handling code using either the techniques in RFC 88 or the techniques in RFC 119. I believe that any needed exception handling logic can be expressed at least as concisely and clearly using RFC 119 than using RFC 88. For example, in the section "C++ like example", the 2nd example looks remarkably like RFC 88 syntax, but the 3rd example demonstrates a more concise representation not available in RFC 88. =head2 Default error reporting RFC 88, together with RFC 80, define a large collection of default behaviors for the exception class and objects. While RFC 119 provides the "bare necessities" to do exception handling (a list of parameters is passed through), the exception class does define a number of implicit features that would be extremely useful for debugging, although most of them could be considered "extra baggage" once debugging is complete. [Is debugging ever complete? :)] If the suggestions in the section "What gets thrown" were taken, all those implicit features could be merged into RFC 119. I'd like to see two modes of operation by the exception class: "normal" and "debug", where normal omitted many of the implicit features (preserving the call stack, debug info, object info, etc.), but could be turned on easily across all loaded modules via some command line switch. =head1 REFERENCES RFC 21: Replace wantarray with a generic want function RFC 63: Exception handling syntax proposal RFC 80: Exception objects and classes for builtins RFC 88: Structured exception handling mechanism Error.pm