RFC 361 (v1) Simplifying split()

Perl6 RFC Librarian Sat, 30 Sep 2000 23:33:34 -0700
This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Simplifying split()

=head1 VERSION

  Maintainer: Sean M. Burke <[EMAIL PROTECTED]>
  Date: 30 Sep 2000
  Mailing List: [EMAIL PROTECTED]
  Number: 361
  Version: 1
  Status: Developing

=head1 ABSTRACT

Perl 5's C<split> function is messy, and should be simplified.

=head1 DESCRIPTION

Perl 5 split does five things that I think are just annoying, and
which I suggest be removed:

=over

=item 1. The first argument to split is currently interpreted as a
regexp, regardless of whether or not it actually is one.  (Yes,
C<split '.', $foo> doesn't split on dot -- it's currently the same an
C<split /./, $foo>.)  I suggest that split be changed to treat only
regexps as regexps, and everything else as literals.

=item 2. Empty trailing fields are currently suppressed (although a
-1 as the third argument disables this).  I suggest that empty trailing
fields be retained by default.

=item 3. When not in list context, split currently splits into @_.  I
suggest that this side-effect be removed.

=item 4. split ?pat? in any context currently splits into @_.  I suggest
that this side-effect be removed.

=item 5. split ' ' (but not split / /) currently splits on whitespace,
but also removes leading empty fields.  I suggest that this
irregularity be removed.

=back

The last three of the above points speak for themselves.  I will focus
on the first two.

Most notably, I suggest that Perl 6 C<split('|', ...)> should work as
most people expect -- splitting on a literal bar.  (Under Perl 5,
C<split('|', ...)> is synonymous with C<split(/|/, ...)> -- i.e.,
split on nullstring or nullstring [sic].)

So I suggest:

   Perl 5:  split /\|/, ...
  be synonymous with (and be better written as)
   Perl 6:  split '|', ...
           # altho  split /\|/, $bar...  remains valid

And as to the second point, the removal of trailing blanks, I suggest:

   Perl 5:   @x = split /:/, $bar, -1;
  be synonymous with
   Perl 6:   @x = split ':', $bar;

If you want to remove trailing fields, under Perl 6 you should have to
do it explicitly:

   Perl 5:   @x = split /:/, $bar;
  be synonymous with
   Perl 6:   @x = split ':', $bar;
             while(@x and !length $x[-1]) { pop @x }

I believe that the current behavior of removing trailing empty fields is
unintuitive and surprising to learners; nothing about the concept of
splitting a string into a list suggests removing trailing empties.
(Moreover, I find that when I need to remove empties, it's not just the
trailing ones; so the current behavior is rarely just what I want.)

=head1 IMPLEMENTATION

I'll leave the C-coding details to the usual, capable implementers.

But I will note one minor complication with my first suggestion (that
literals and regexps be distinguished).  Consider:

  Perl 6:   @x = split $foo, $bar;

I suggest that the correct approach is to treat $foo's value as a
literal, unless it holds an object of class Regexp (or a class derived
from it?), in which case it should be treated as if the above were:

  Perl 6:   @x = split qr/$foo/, $bar;

In other words, in such cases it is not possible to know at compile time
whether a given "split" operator means literal-split or regexp-split.
I note that such cases are rare.

=head1 ALTERNATIVE APPROACH

In conclusion, I'll note that there is a conservative alternative
approach possible: if any of the above features of Perl 5 split seem
really worth keeping, my suggestion for a "clean split" can be
implemented as a separate operator called, for example, "cleave".

(Consider the precedent of Perl 5 chomp being added alongside Perl 4
chop, not replacing it.)

I would consider this suboptimal, though; I think that an operator with
as straightforward and intuitive a name as "split" should behave in a
straightforward and intuitive way.

=head1 REFERENCES

Nil.
RFC 361 (v1) Simplifying split()

Reply via email to