Re: RFC 58 (v1) C changes.

Bart Lateur Tue, 08 Aug 2000 13:26:42 -0700
On Tue, 8 Aug 2000 10:12:49 -0700 (PDT), Larry Wall wrote:

>If chomp exists in Perl 6 at all, I think it would
>have to be some kind of method call on the string that figures out what
>the discipline determined to be the terminator *for the current line*.
>(Note that under Unicode, we might well have one line terminated with a
>line separator, and the next line terminated with a page separator, and
>the line after that terminated with a CFLF.)

I have a far more radical, but still very rough suggestion. Suppose we
throw out the current concept of chomp (and chop) completely.

A small but extemely subjective pseudo-historical overview.

Originally, I assume that Larry wall only expected to read files line by
line, and that each and every single line would be terminated with a
newline (Unix: LF). THen the concept is simple: to get rid of the
newline, simply remove the last character.

Then it dawned on people that the very last line of a file need not end
in a newline at all. After all, a newline is just a line separator, not
necessarily a line terminator (allthough a newline can only appear at
the end of a line). Also, people could choos a multi-character
end-of-line string. So chomp() was introduced, with two features:

 * It checks before removing anything, and
 * it can remove more than one character.

Even later, people were asking for a regex-like $/ processing, maily in
order to be able to process text files from other platforms
transparently. That was rejected, likely for speed reasons. The only
regex-like behaviour is with $/ set to "" (not undef), which searches
for a sequence of multiple "\n" (at least two) in a row.

chomp() has to match this. What is at the line end because of reading a
line with $/ set to whatever you want,chomp is able to find and remove
it.

And now, projections of a possible future.

$/ may be replaced by something very liberal, possibly even with
something like a regex. Suppose that $RE is that regex, for example
qr/\015\12|[\n\r\f]/. chomp() can remove that from the line, by doing
something like

        s/(?:\015\012|[\r\n\f])$//;

But that is rather inefficient, speedwise.

Looking back: chop() will remove the last character whatever it is,
chomp() will currently look for a literal string, at the end of the
string. Still rather efficient. The regex is not; unless the regex
mechanism was modified so the string is scanned back to front, i.e.
starting at the end and going backwards. But, aren't we doing the same
work twice? Didn't <FILE> already look for, and effectively find that
variable length line terminator? Let's cache that result, somewhere.

A reasonably backward compatible system could turn a string, ANY scalar,
into an object. when reading the value as a string, the overloaded '""'
function could return the scalar's string value. <FILE> would return
such an object, with a field set to the length of the line terminator
found. Let's call it chomplength. chomp() would read that field, reduce
the length of the string value by that many characters, and set
chomplength to 0. A string put together in another way could set
chomplength to undef, indicating "to be determined" for example using
the above regex.

But, isn't that just a bit too general? would we really accept the
overhead this would introduce  to ALL string manipulations? I think not.
So let's drop this generic concept of chop() and chomp(). What would we
loose?

 * chop(): it can be used to get and remove any character from the end
of any string. Currently, it behaves as 

        chop($string) === substr $string, -1, 1, ''

(I think). That means that you could do this:

        while(length($last = chop $string)) {
            # process that last character
        }

in order to process each character in turn, back to front. So we loose
that. Big deal.

 * chomp(). Has anybody ever done this:

        chomp($string = <<'EOT');
        This is a multiline string.
        This is line two.
        This is the last line.
        EOT

Just to get rid of the final newline? Well, I have. But loosing this
capability is not something to loose any sleep over.

So what are these really good for? To get rid of the line terminator, or
"Record Separator", when reading from a file. That is what they are for,
that is what we should facilitate. Not the chop()ping or chomp()ing of
just any string.

So, let's keep in tune with the RFC that started all this: an easier way
to integrate the functionality of chomp() into the activity of reading
one line. So let's make the chomp() implicit.

All you'd have to do, is set a boolean flag on the "filehandle object",
saying: "Oi, I want this chomped.". For example:

        $fh->chomped(1);

Then, what <$fh> would do, is read one line, looking for whatever line
end it accepts, finish where found, and depending on whether this flag
was set or not, remove that many characters from the end. Finito. Very
simple, probably pretty damn fast, too. The syntax could remain:

        while(<$fh>) {
                ...
        }

which would only terminate at the end of the file. The bare keyword
chomp() would no longer exist.

>On the other end, we're constrained to be able to translate current
>uses of chop and chomp to something that will work in Perl 6, so we
>can't just throw them out and say the input disciplines will do it all.

Well, duh...

-- 
        Bart.
Re: RFC 58 (v1) C changes.

Reply via email to