This and other RFCs are available on the web at http://dev.perl.org/rfc/ =head1 TITLE Short-circuiting built-in functions and user-defined subroutines =head1 VERSION Maintainer: Garrett Goebel <[EMAIL PROTECTED]> Date: 6 Sep 2000 Last Modified: 15 Sep 2000 Mailing List: [EMAIL PROTECTED] Number: 199 Version: 4 Status: Frozen =head1 ABSTRACT Allow built-in functions and user defined subroutines to catch exceptions and act upon them, particularly those emanating from control blocks. Put another way, allow blocks to simultaneously exit early with a value. =head1 DESCRIPTION =head2 OVERVIEW Currently there is not a practical and flexible way to short-circuit built-in functions like C<grep> and C<map> which take blocks as a parameter. A prime example that was raised is: my $found = grep { $_ == 1 } (1..1_000_000); Under Perl 5, one would need to write something like: eval { grep { $_ ==1 and die "$_\n" } (1..1_000_000) }; chomp(my $found = $@); Under this current proposal this might be written as: my $found = grep { $_ == 1 and return $_ && last grep } (1..1_000_000); (or alternatively) my $found = grep { $_ == 1 and abort grep $_ } (1..1_000_000); It is the opinion of the author that while the proposal saves little by way of typing, that it is minimally an improvement in clarity. Further this proposal introduces semantics which will make other tasks easier. =head2 DEFINITION OF THE PROBLEM =over 4 =item 1 How do you simultaneously exit early and return a value? reject the current element and continue accept the current element and continue reject the current element and abort accept the current element and abort reject the current element and retry accept the current element and retry =item 2 How do you do that *and* be consistent with existing semantics? =back =head2 OPTIONS AND ALTERNATIVES There has been much discussion and ideas exchanged on the best way to achieve this. Discussion ranging from specific solutions which would only apply to the C<grep> and C<map> built-ins, to more general ones such as unifying the exception syntax for loop, code, and bare blocks. There has also been a vocal contingent protesting that all of the suggestions attempt to shoehorn too much new functionality into existing semantics and that no one has yet presented any workable solutions. The current ideas tend to converge around the following three options: =over 4 =item 1 Grant code blocks their names as labels, and add new keywords for code block control: C<continue>/C<abort>/C<retry>: # short-circuit after first acceptance... $found = grep { ($_ == 1) and abort grep $_ } (1..1_000_000); # short-circuit after first rejection... $found = grep { ($_ == 1) and abort grep } (1..1_000_000); =item 2 Grant code blocks their own names as labels, and the ability to catch next/last/redo exceptions which explicitly use those labels. # short-circuit after first acceptance... @smalls = grep { ($_ == 1) and return $_ && last grep } (1..1_000_000); # short-circuit after first rejection... @smalls = grep { ($_ == 1) and last grep } (1..1_000_000); =item 3 The built-in function's operation should catch exceptions thrown from the block, where the exception thrown is true or false (alternative: 1 or false). Example: # short-circuit after first acceptance... $found = grep { ($_ == 1) and die 1 } (1..1_000_000); # short-circuit after first rejection... $found = grep { ($_ == 1) or die 0 } (1..1_000_000); =back =head2 INDECISIONS: BY THE PRICKINGS OF MY THUMB... Detractors would argue: =over 4 =item 1 First Proposal: Add code labels and new keywords: C<continue>/C<abort>/C<retry> =over 4 =item * Aversion to taking any more new keywords =back =item 2 Second Proposal: Add code labels, and let code blocks use C<next>/C<last>/C<redo> syntax which explicitly use those labels. =over 4 =item * Too complicated, changes too much, attempts to shoehorn to too many semantic changes into the existing syntax/keywords =back =item 3 Third Proposal: Extend use of C<die> =over 4 =item * Built-ins shouldn't catch user-defined exceptions. How do you raise a serious exception within a control block if you weaken C<die> into something that can be automatically caught by built-ins? I.e., no good support for multi-level exceptions. =item * Doesn't provide a very intuitive syntax for doing the same type of C<next>/C<last>/C<redo> control as is available with loop blocks. =back =back The author shares much of detractors opinions for the third proposal, and would prefer either the first or second to be implemented. The reason I prefer one of these two is that I believe they are more intuitive, flexible, and semantically rich. As such, the remainder of this RFC concern itself with issues pertaining to the first two proposals. =head2 FINER DETAILS AND OTHER ISSUES =head3 Issues common to the first and second proposals: =over 4 =item * Built-ins and Subroutines automatically get their name as a label It might therefore follow that it would be good to define C<goto foo> to have the same semantics as C<goto &foo> when the label referred to is a function. Namely to pass @_. It would then be possible if desired to deprecate C<goto &foo>. =item * To support multi-level escaping, explicit named or numeric labels may be used. Where numeric labels specify the level of nesting to be escaped similarly to C<caller>. 0 refers to the current code block, 1 the caller's, and so forth. Otherwise C<undef> may also be used to refer to the current code block. Regardless of whether or not this proposal is implemented, it might be nice to add the same numeric label semantics when labels are used with loop control statements. Examples: sub func1 { @_ ? &func2 : abort undef, 0 } sub func2 { $_[0] eq undef ? abort func1, 0 : &func3 } sub func3 { $_[0] eq 'foo' ? abort 2, 0 : 1 } =item * The idea of being able to both short-circuit a block and return a value may create confusion as to the context in which the value is being returned. I've slightly modified one of John Porters examples which illustrate the various blocks and their controls: sub foo { # Code Block eval { # Bare Block for (...) { # Loop Block last && return 1; # Note: all die && return 1; # of these go abort 1, 1 # to different return 1; # places } }; } This example doesn't confuse the author. But that isn't to say that others won't find it confusing. =back =head3 Issues specific to adding code block control statements =over 4 =item * When using code block control statements, labels are required but return values are not. This is necessary to support optionally passing a return value. =back =head3 Issues specific to adding support for loop control syntax to code control =over 4 =item * Let C<return $value && next LABEL> be used to support both short-circuiting and returning a value There was some consensus that C<return> is a good fit for returning a value from a control block. But it doesn't support multi-level escaping. Therefore the use of loop control syntax is tacked on. The question becomes if both C<return> and C<last> are both exceptions which exit the block early, how do you catch them both and handle them appropriately? Is C<&&> sufficient to the cause? From the author's perspective it looks fine and dandy. But I have little or no idea how this would work in regard to Perl internals. =item * Need to stop calling them loop control statements and start referring to them as short-circuit or statements commonly used with loop control =item * Optionally, we might also consider allowing people to both raise an exception with C<die> && C<return> $value =back =head2 BACKWARD COMPATIBILITY ISSUES If loop control syntax is grafted on for use with code blocks, then a Perl 5 -> 6 translator would need to explicitly label all loop blocks and loop control statements. =head1 IMPLEMENTATION I haven't a clue. =head1 REFERENCES Too many contributors to note them all down. The content of this RFC owes heavily to list discussions. The author has simply tried to pick and choose the best and most easily grasped ideas and analyses from the list. =over 4 =item RFC's that require this functionality RFC 76: Builtin: reduce =item Ideas that offer alternative routes to similar functionality: RFC 22: Control flow: Builtin switch statement RFC 31: Subroutines: Co-routines RFC 123: Builtin: lazy RFC 225: Data: Superpositions =back