Nicholas Clark wrote: > IIRC Ilya mailed p5p bemoaning the fact that perl's SVs use a continuous > buffer. A split-buffer representation (where a hole is allowed in the > middle of the buffer data) permits much faster replacement type operations, > as there is less copying, and you can move the hole around to suit your > needs. I posted a RFC for something like that a while ago but got no reaction from the crowd. It is not an internal optimisation like the one stated above, but a whole new [no, I won't say paradigm] concept that could be *the* reason that makes perl6 worthwile. I've attached the RFC again and would hope to at least get some "Nah..." or "Yeah!" as feedback. Cheers! Roland -- [EMAIL PROTECTED]
=head1 TITLE Perl should support non-linear text. =head1 VERSION Maintainer: Roland Giersig <[EMAIL PROTECTED]> Date: 19 Oct 2000 Version: 1 Mailing List: perl6-internals ? Number: ? =head1 ABSTRACT Right now, Perl performs its magic only upon linear strings of ASCII and Unicode text. As Ilya Zakharevich has stated in his recent interview (http://www.perl.com/pub/2000/09/ilya.html), the new feature that would help todays Perl programmers most is if Perl would be capable to perform its mighty string operations on marked-up (non-linear) text consisting of linear chunks of text strings that carry different attributes. This could very well be THE new feature that justifies the complete Perl6 rewrite! =head1 DESCRIPTION When Perl first came into being, the world was full of ASCII text, so Perl became strong in manipulating ASCII text. But this has changed. Nowadays even the simplest documents (e.g. mail messages) tend to be in some marked-up format or other, and programmers worldwide are struggling in finding a way to manipulate those. To aid these efforts I therefore propose to enhance the string format used in Perl: non-linear text, consisting of chunks of linear text (Unicode, of course) that have attributes attached. Take this HTML for example: <html>Text with a <b>larger <font size=+1>l</font>etter</b> in it. </html> and try to find a way to substitute the word `letter' with `word', with outside formatting (<b>) preserved. Next to impossible? I found no easy (but general) way, even not with HTML::Parser et. al. If perl could handle non-linear strings, this could be done in a simple s/letter/word/. Ain't that time-saving!! For example s/(l)etter/${"w":${1:}}ord/ could do the magic (see below for a syntax proposal). Or, to make formulas more readable: s/\b(\w+)^(\d+)/$1${2:raised=>1}/ =head1 IMPLEMENTATION Ugh, you got me there. I know very little about Perl internals, so I can't even pretend something. Maybe Ilya has already started on a prototype? ;-) Anyway, the current document parsers (HTML::Parser et. al.) already build non-linear text data structures. Basically these structs are lists of strings interspersed with refs to embedded structs (and attributes) of the same type. It has to be discussed if this structure is flexible enough for most purposes. Attributes could be simply stored as hashes, so the chunks would have hash refs attached. This sounds rather easy to accomplish. So, what today is a string would become an array of strings with attached hashes internally. This doesn't sound too strange, but again, this is for others to decide. =head1 SYNTAX We need a way to specify attributes to chunks of text in a backward compatible way. But how can we specify it in a compact way? Hmm, as variable access by name is deprecated anyhow, we could use ${var} to mean $var and ${"text"} to mean "text". Now we can use `:' to separate the varname from the attributes: ${foo:size} # accesses attribute `size' in variable `foo' # set attribute `size' ${foo:size} = $fontsize; # copy attribute `a1' of text in var `bar' to attribute `a2' in var `foo' ${foo:a2} = ${bar:a1}; # copy all attributes, but leave text as-is ${foo:} = ${bar:}; Now for literal strings with embedded attributes: $foo = "just another string"; ${foo:size} = 12; or $foo = ${"just another string":size=>10}; This can nest: $bar = ${"${"L":size=>12}arge":size=>10}; ${bar:size} gives 10 How to loop over all chunks? Hmm, seems like split could handle it OK if the regex engine can match chunk borders. Seems like another special token is needed. How about `\C' for chunk? Or is this already taken? $astring = ${"${"L":size=>12}arge ${"S":size=>8}mall":size=>10}; foreach my $chunk (split /\C/ $astring) { print "$chunk: ${chunk:size}\n"; } would print L: 12 arge: 10 S: 8 mall: 10 What if an attributed string is split in half? Well, in that case, the attributes must be duplicated. $foo = ${"no attrib here ${"ATTRIBUTES":size=>12} nothing here":size=>8}; $firsthalf = substr($foo, 0, length($foo)/2); should set $firsthalf to ${"no attrib here ${"ATTR":size=>12}":size=>8} and substr($foo, length($foo)/2, 14, "really ${"nothing":attrib=>1}"); should set $foo to ${"no attrib here ${"ATTR":size=>12}${"really ${"nothing":attrib=>1}":}":size=>8} Hmm, what about string comparisions? `eq' and friends should simply conmtinue to work as usual on the string contents. Do we need some kind of meta-eq to be able to compare the attribs also? There are a lot of other issues to work out, but I'd like to first get some approval from the gurus, so I'll stop here. =head1 REFERENCES http://www.perl.com/pub/2000/09/ilya.html =cut