This and other RFCs are available on the web at
http://dev.perl.org/rfc/
=head1 TITLE
Simplifying split()
=head1 VERSION
Maintainer: Sean M. Burke <[EMAIL PROTECTED]>
Date: 30 Sep 2000
Mailing List: [EMAIL PROTECTED]
Number: 361
Version: 1
Status: Developing
=head1 ABSTRACT
Perl 5's C<split> function is messy, and should be simplified.
=head1 DESCRIPTION
Perl 5 split does five things that I think are just annoying, and
which I suggest be removed:
=over
=item 1. The first argument to split is currently interpreted as a
regexp, regardless of whether or not it actually is one. (Yes,
C<split '.', $foo> doesn't split on dot -- it's currently the same an
C<split /./, $foo>.) I suggest that split be changed to treat only
regexps as regexps, and everything else as literals.
=item 2. Empty trailing fields are currently suppressed (although a
-1 as the third argument disables this). I suggest that empty trailing
fields be retained by default.
=item 3. When not in list context, split currently splits into @_. I
suggest that this side-effect be removed.
=item 4. split ?pat? in any context currently splits into @_. I suggest
that this side-effect be removed.
=item 5. split ' ' (but not split / /) currently splits on whitespace,
but also removes leading empty fields. I suggest that this
irregularity be removed.
=back
The last three of the above points speak for themselves. I will focus
on the first two.
Most notably, I suggest that Perl 6 C<split('|', ...)> should work as
most people expect -- splitting on a literal bar. (Under Perl 5,
C<split('|', ...)> is synonymous with C<split(/|/, ...)> -- i.e.,
split on nullstring or nullstring [sic].)
So I suggest:
Perl 5: split /\|/, ...
be synonymous with (and be better written as)
Perl 6: split '|', ...
# altho split /\|/, $bar... remains valid
And as to the second point, the removal of trailing blanks, I suggest:
Perl 5: @x = split /:/, $bar, -1;
be synonymous with
Perl 6: @x = split ':', $bar;
If you want to remove trailing fields, under Perl 6 you should have to
do it explicitly:
Perl 5: @x = split /:/, $bar;
be synonymous with
Perl 6: @x = split ':', $bar;
while(@x and !length $x[-1]) { pop @x }
I believe that the current behavior of removing trailing empty fields is
unintuitive and surprising to learners; nothing about the concept of
splitting a string into a list suggests removing trailing empties.
(Moreover, I find that when I need to remove empties, it's not just the
trailing ones; so the current behavior is rarely just what I want.)
=head1 IMPLEMENTATION
I'll leave the C-coding details to the usual, capable implementers.
But I will note one minor complication with my first suggestion (that
literals and regexps be distinguished). Consider:
Perl 6: @x = split $foo, $bar;
I suggest that the correct approach is to treat $foo's value as a
literal, unless it holds an object of class Regexp (or a class derived
from it?), in which case it should be treated as if the above were:
Perl 6: @x = split qr/$foo/, $bar;
In other words, in such cases it is not possible to know at compile time
whether a given "split" operator means literal-split or regexp-split.
I note that such cases are rare.
=head1 ALTERNATIVE APPROACH
In conclusion, I'll note that there is a conservative alternative
approach possible: if any of the above features of Perl 5 split seem
really worth keeping, my suggestion for a "clean split" can be
implemented as a separate operator called, for example, "cleave".
(Consider the precedent of Perl 5 chomp being added alongside Perl 4
chop, not replacing it.)
I would consider this suboptimal, though; I think that an operator with
as straightforward and intuitive a name as "split" should behave in a
straightforward and intuitive way.
=head1 REFERENCES
Nil.