This and other RFCs are available on the web at
http://dev.perl.org/rfc/
=head1 TITLE
Behavior of empty regex should be simple
=head1 VERSION
Maintainer: Mark Dominus <[EMAIL PROTECTED]>
Date: 24 August 2000
Version: 1
Mailing List: [EMAIL PROTECTED]
Number: 144
=head1 ABSTRACT
=head2 Standard Documentation
According to L<perlop>:
=over 4
=item m/PATTERN/cgimosx
=item /PATTERN/cgimosx
If the PATTERN evaluates to the empty string, the last successfully
matched regular expression is used instead.
=back
This behavior should be changed. If the PATTERN is empty, Perl should
look for the empty string. (That is, if the PATTERN is empty, it
should always match.)
=head1 DESCRIPTION
Literal empty patterns, such as:
$s =~ // ;
are not the problem here. The real problem is that the special case
is invoked for interpolated patterns also. For example,
chomp($pat = <STDIN>);
$s =~ /\Q$pat\E/;
looks to see if $pat is a substring of $s, unless $pat is empty, in
which case it matches $s against the last regex that was matched
successfully. That regex might be far away, in some other module.
If the far-away regex happened to contain backreference groups, the
backreference variables will be set accordingly.
To make this safe in Perl 5, the programmer has to write something
peculiar like
$s =~ /(?=)\Q$pat\E/;
to ensure that the regex, after interpolation, is never empty.
I propose that this 'last successful match' behavior be discarded
entirely, and that an empty pattern always match the empty string.
=head2 Alternatives
In the past it has been proposed that the special behavior be
discarded for patterns with interpolated variables, but retained for
empty patterns that appear literally in the source. So the example
above would work the way I proposed, but
$s =~ // ;
would still retain the special behavior and match $s against whatever
was the last pattern to be successfully matched.
I propose to eliminate the special behavior entirely, because it is
not useful, and because it is confusing.
=head2 The Feature Was Not Useful
The special behavior for empty patterns has never been particularly
useful. For example, you could imagine code like this:
for $pat (@patterns) {
if ($a =~ /$pat/ && $b =~ //) {
# do something
}
}
This would be more efficient than the equivalent
for $pat (@patterns) {
if ($a =~ /$pat/ && $b =~ /$pat/) {
# do something
}
}
because $pat would be compiled only once per loop instead of twice.
It is now more straightforward and efficient to do this sort of thing
explicitly with the qr// operator:
@patterns = map qr/$_/, @patterns;
for $pat (@patterns) {
if ($a =~ /$pat/ && $b =~ /$pat/) {
# do something
}
}
=head2 The Feature Was Not Useful
People sometimes propose the following use for the empty pattern
special case: They have a pattern, and many strings, and they want to
see if every string matches the pattern. This code works, but is
inefficient:
sub match_all {
my $pat = shift;
for (@_) {
return 0 unless /$pat/;
}
return 1;
}
This is because C</$pat/> must be recompiled for each string, or checked
to see whether recompilation is necessary.
This code does not work:
sub match_all {
my $pat = shift;
for (@_) {
return 0 unless /$pat/o;
}
return 1;
}
because C<$pat> changes with each call.
One solution is to use 'eval' here to generate the pattern matching
code (with C</o>) at run time.
People have sometimes tried to use C<//> here, but usually without
success. The idea is:
sub match_all {
my $pat = shift;
# load $pat into 'last successfully matched' space
for (@_) {
return 0 unless //;
}
return 1;
}
The problem here is that there is no way to designate $pat as the last
successfully matched regex without actually finding a string that
matches it. In the past people attempting this strategy have appeared
in C<comp.lang.perl.misc> asking how to find a string that matches a
given regex. As far as I know, no useful solutions have been offered.
(In fact, there may not be any such string. Consider the pattern
C</a\bx/> for example.)
A better, simpler solution to this problem is to use the C<qr> operator:
sub match_all {
my $pat = shift;
$pat = qr($pat);
for (@_) {
return 0 unless /$pat/;
}
return 1;
}
=head2 This feature has resulted in bugs
Any code that contains the innocent-looking
if (/\Q$string\E/) {
...
}
is potentially booby-trapped. Such code is common. An example of
this type appears in L<perlfaq6>.
=head1 IMPLEMENTATION
No special implementation is necessary.
=head1 TRANSLATION ISSUES
Old Perl 5 scripts that may depend on this feature may be hard to
translate unless the feature is implemented in Perl 6 also. However,
it's not clear that such translation would even be desirable.
=head1 REFERENCES
L<perlop> discussion of empty patterns, quoted above.