Re: Perlstorm #0040
> I lie: the other reason qr{} currently doesn't behave like that is that > when we interpolate a compiled regexp into a context that requires it be > recompiled, Interpolated qr() items shouldn't be recompiled anyway. They should be treated as subroutine calls. Unfortunately, this requires a reentrant regex engine, which Perl doesn't have. But I think it's the right way to go, and it would solve the backreference problem, as well as many other related problems.
Re: RFC 208 (v2) crypt() default salt
Bart Lateur: > >If there are no objections, I will freeze this in twenty-four hours. > > Oh, I have a small one: I feel that this pseudo-random salt should NOT > affect the standard random generator. I'll clarify: by default, if you > feed the pseudo-random generator with a certain number, you'll get the > same sequence of output numbers, every single time. There are > applications for this. I think that any call to crypt() should NEVER > change this sequence of numbers, in particular, it should not skip a > number every time crypt() is called with one parameter. > > Therefore, crypt() should have it's own pseudo-random generator. A > simple task, really: same code, but a different seed variable. I had considered this for the original RFC, but I decided against it. To implement it, Perl would have to have its own built-in random number generator, because there is no way to save and restore the old state of rand() (for example). It would substantially complicate the code. And the problem you describe is not really a problem. There has never been any guarantee that a program would produce the same sequence of random numbers after a change to the Perl binary. More recent versions of Perl use random() or drand48() if they are available, instead of rand(). A program run under an old version of Perl and then a newer version that used random() instead of rand() would generate a different sequence of random numbers depending on which version of Perl was running it, even if the seed was the same. This has never been an issue in the past, so I did not consider it important. I will add a note aboput this to the RFC. If there are no other comments, I will freeze it in 24 hours.
Re: Threaded Perl bytecode (was: Re: stackless python)
> > Joshua N Pritikin writes: > > : http://www.oreillynet.com/pub/a/python/2000/10/04/stackless-intro.html > > > > Perl 5 is already stackless in that sense, though we never implemented > > continuations. The main impetus for going stackless was to make it > > possible to implement a Forth-style treaded code interpreter, though > > we never put one of those into production either. There's a large school of thought in the Lisp world that holds that full continuations are a bad idea. See for example: http://www.deja.com/threadmsg_ct.xp?AN=635369657 Executive summary of this article: * Continuations are hard to implement and harder to implement efficiently. Languages with continuations tend to be slower because of the extreme generality constraints imposed by the presence of continuations. * Typical uses of continuations are for things like exception handling. Nobody really uses continuations because they are too difficult to understand. Exception handling is adequately served by simpler and more efficient catch-throw mechanisms which everyone already understands. Anyone seriously interested in putting continuations into Perl 6 would probably do well to read the entire thread headed by the article I cited above.
Critique available
My critique of the Perl 6 RFC process and following discussion is now available at http://www.perl.com/pub/2000/11/perl6rfc.html Mark-Jason Dominus [EMAIL PROTECTED] I am boycotting Amazon. See http://www.plover.com/~mjd/amazon.html for details.
Re: Critique available
> To strive for balance, I think perl.com's home page should also have the > links to Larry's ALS talk and slides. Thanks very much. I have asked the folks at Songline to arrange this. We were going to carry these, and in fact the ORA were prepared to complete Nat's transcript, but then Ask posted them first, so we didn't go ahead with that. But you are right, the web bage should have links to them. This was an oversight on my part.
Fwd: Response to Critique of Perl 6 RFC Process
Frank Tobin has generously given me permission to forward his comments to this list. --- Forwarded Message Date: Thu, 2 Nov 2000 00:31:42 -0600 (CST) From: Frank Tobin <[EMAIL PROTECTED]> X-Sender: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: Response to Critique of Perl 6 RFC Process Message-ID: <[EMAIL PROTECTED]> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII - -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I appreciated reading your critique of the RFC process. I think one problem that contributed to the mess of "mishandled" RFC's was that there were no real written guidelines on what authors should do with RFC's in various circumstances. For example, when should an RFC be withdrawn? How about withdrawn? Does withdrawn mean the RFC should be revoked because there is something inherently bad about (e.g., wanting a perfect data structure), or can it also mean the RFC is simply heavily disliked? Does frozen mean "it looks good, let's go for it", or does it mean "no further changes will improve the RFC". >From personal experience, I was the maintainer of RFC 357 (Perl should use XML for docs instead of POD). This generated a lot of criticism/debate. In general it seemed like the heavy majority of the Perl community was against it. I decided to mark the RFC as frozen, while adding a section in the RFC about how the RFC was against it (although I didn't go into much detail, I admit). I decided against withdrawing it, because I felt there wasn't anything inherently wrong about the RFC; it was just disliked. The problem seemed that "withdrawn" and "frozen" weren't orthogonal choices. Perhaps one problem was that there was only one field for the status of an RFC. Perhaps two were needed. One of these would be "Closure: Open/Closed", which would indicate the activeness of the RFC, and the other would be "Resolution: Popular/Unpopular/AlreadyDone/Impossible/etc". Maybe this would've given maintainers the ability to better describe the status of the RFC; I know it would've made my choice easier. - - -- Frank Tobin http://www.uiuc.edu/~ftobin/ - -BEGIN PGP SIGNATURE- Version: GnuPG v1.0.4 (FreeBSD) Comment: pgpenvelope 2.9.0 - http://pgpenvelope.sourceforge.net/ iEYEARECAAYFAjoBClUACgkQVv/RCiYMT6MBMwCfe0povtY/42rca0qn9E+Sc6pb 7UgAoK7YQ6gp61LjdgZvDXFD77Oao6Gv =xn0j - -END PGP SIGNATURE- --- End of Forwarded Message
Re: Critique available
> I just figured it was time for a little nudge. Yes, thank you. It is on www.perl.com now.
Re: Critique available
> Anyone think others are needed? "Stick to the subject."
Garbage collector slowness
http://www.xanalys.com/software_tools/mm/articles/lang.html#emacs.lisp Erik Naggum ([EMAIL PROTECTED]) reports: I have run some tests at the U of Oslo with about 100 users who generally agreed that Emacs had become faster in the latest Emacs pretest. All I had done was to remove the "Garbage collecting" message which people perceive as slowing Emacs down and tell them that it had been sped up. It is, somehow, permissible for a program to take a lot of time doing any other task than administrative duties like garbage collection. Mark-Jason Dominus [EMAIL PROTECTED] I am boycotting Amazon. See http://www.plover.com/~mjd/amazon.html for details.
Re: Garbage collector slowness
> "The new version must be better because our gazillion dollar marketing > campaign said so. (We didn't really *fix* anything.) The part I found interesting was the part about elimination of the message. Perceived slowness is also important.
Re: Schwartzian Transform
> So you can say > > use Memoize; > # ... > memoize 'f'; > @sorted = sort { my_compare(f($a),f($b)) } @unsorted > > to get a lot of the effect of the S word. Yes, and of course the inline version of this technique is also common: @sorted = sort { my $ac = $cache{$a} ||= f($a); my $bc = $cache{$b} ||= f($b); my_compare($ac,$bc); } @unsorted; Joseph Hall calls this the 'Orcish Maneuver'. However (I don't know who suggested this, but:) > > > > >I'd think /perl/ should complain if your comparison function isn't > > > > >idempotent (if warnings on, of course). If nothing else, it's probably an > > > > >indicator that you should be using that schwartz thang. I have to agree with whoever followed up that this is a really dumb idea. It reminds me of the time I was teaching the regex class at TPC3, and I explained how the /o in /$foo/o represents a promise to Perl that $foo will never change, so Perl can skip the operation of checking to see if it has changed every time the match is performed. Then there was a question from someone in the audience, asking if Perl would emit a warning if $foo changed. On the other side of the argument, however, I should mention that I've planned for a long time to write a Sort::Test module which *would* check to make sure the comparator function behaved properly, and would report problems. When you use the module, it would make all your sorts run really slowly, but you would get a warning if your comparator was bad. Idempotency is not the important thing here. The *important* property that the comparator needs, and the one that bad comparators usually lack is if my_compare(a,b) < 0, and my_compare(b,c) < 0, then it should also be the case that my_compare(a,c) < 0 for all keys a, b, and c. Sort::Test would run a quadratic sort such as a bubble sort, and make sure that this essential condition held true. Note in particular that if the comparator has the form { my_compare(f(a),f(b)) }, then it does not matter if f() is idempotent; what really matters is that my_compare should have the property above. I had also planned to have optional checks: use Sort::Test 'self'; (Make sure that my_compare(a,a) == 0 for all a) use Sort::Test 'twice'; (Make sure that my_compare(a,b) == my_compare(a,b) for all a,b) This last is essentially the idempotency restriction again. The reason I've never implemented this module is that in perl 5, sort() cannot be overridden, so the usefulness seemed low; you would have to rewrite your source code to use it. I hope this limitation is fixed in perl 6, because it would be a cool hack. Finally, another argument in the opposite direction yet again. It has always seemed to me that this 'inconsistent sort comparator' thing is a tempest in a teapot. In the past it has gotten a lot of attention because some system libraries have a qsort() function that dumps core if the comparator is inconsistent. To me, this obviously indicates a defective implementation of qsort(). If the sort function dumps core or otherwise detects an inconsistent comparator, it is obviously functioning suboptimally. An optimal sort will not notice that the comparator is inconsistent, because the only you can find out that the comparator is returning inconsistent results is if you call it in a situation where you already know what the result should be, and it returns a different result. An optimal sort function will not call the comparator if it already knows what the result should be! For example, consider the property from above: if my_compare(a,b) < 0, and my_compare(b,c) < 0, then my_compare(a,c) < 0. If the qsort() already knows that a
Re: Please make "last" work in "grep"
On (03 May 2001 10:23:15 +0300) you wrote: > Michael Schwern: > > > > Would be neat if: my($first) = grep {...} @list; knew to stop itself, yes. > > > > It also reminds me of mjd's mention of: my($first) = sort {...} @list; > > being O(n) if Perl were really Lazy. > > But it would need a completely different algorithm. Not precisely. If you have lazy evaluation, then quicksort is exactly what is wanted here. For example, if you implement qsort in the straightforward way in Haskell, and write min = first quicksort list; then it *does* run in O(n) time; in this case qucksort reduces to Hoare's algorithm for min. > my ($first, $second, $third) = sort {...} @list; The Haskell version of this also runs in O(n) time. > is kind-of plausible. So we'd definitely want > > ((undef)x((@list+1)/2), $median) = sort {...} @list; The Haskell equivalent of this (still using quicksort) runs in O(n log n) time, which I believe is optimal for finding the median.
Re: explicitly declare closures???
Says Dave Mitchell: > Closures ... can also be dangerous and counter-intuitive, espcially to > the uninitiated. For example, how many people could say what the > following should output, with and without $x commented out, and why: > > { > my $x = "bar"; > sub foo { > # $x # <- uncommenting this line changes the outcome > return sub {$x}; > } > } > print foo()->(); > That is confusing, but it is not because closures are confusing. It is confusing because it is a BUG. In Perl 5, named subroutines are not properly closed. If the bug were fixed, the result would be 'bar' regardless of whether or not $x was commented. This would solve the problems with mod_perl also. The right way to fix this is not to eliminate closures, or to require declarations. The right way to fix this is to FIX THE BUG.
Re: Objects, methods, attributes, properties, and other related frobnitzes
Dan Sugalski <[EMAIL PROTECTED]>: > At 2:06 PM + 2/19/03, Peter Haworth wrote: > >On Fri, 14 Feb 2003 15:56:25 -0500, Dan Sugalski wrote: > >> I got clarification. The sequence is: > >> > >> 1) Search for method of the matching name in inheritance tree > >> 2) if #1 fails, search for an AUTOLOAD > >> 3) if #2 fails (or all AUTOLOADs give up) then do MM dispatch > > > >Shouldn't we be traversing the inheritance tree once, doing these three > >steps at each node until one works, rather doing each step once for the > >whole tree. MM dispatch probably complicates this, though. > > No, you have to do it multiple times. AUTOLOAD is a last-chance > fallback, so it ought not be called until all other chances have > failed. Pardon me for coming in in the middle, but it seems to me that only one traversal should be necessary. The first traversal can accumulate a temporary linked list of AUTOLOAD subroutines. If the first traversal locates an appropriate method, the linked list is discarded. If no appropriate method is found, control is dispatched to the AUTOLOAD subroutine at the head of the list, if there is one; if the list is empty the MM dispatch is tried.
Testing job
I'm writing automated tests for the example code in my book, which will go into production early next month. I have the harness and test apparatus all set up; I wrote a complete set of tests for chapter 6, and I think I know how I want it done. But I need help writing the tests themselves, because time is short and I have a lot of other stuff to do. If you would be interested in helping me with this, send me mail right away. I believe that my publisher is willing to pay for it, although I don't know how much. -D.
Re: RFC 105 (v1) Downgrade or remove "In string @ must be \@" error
This has already been done for Perl 5.6.1. Here is what perldelta.pod has to say. =head2 Arrays now Always Interpolate Into Double-Quoted Strings In double-quoted strings, arrays now interpolate, no matter what. The behavior in perl 5 was that arrays would interpolate into strings if the array had been mentioned before the string was compiled, and otherwise Perl would raise a fatal compile-time error. In versions 5.000 through 5.003, the error was Literal @example now requires backslash In versions 5.004_01 through 5.6.0, the error was In string, @example now must be written as \@example The idea here was to get people into the habit of writing C<"fred\@example.com"> when they wanted a literal C<@> sign, just as they have always written C<"Give me back my \$5"> when they wanted a literal C<$> sign. Starting with 5.6.1, when Perl now sees an C<@> sign in a double-quoted string, it I attempts to interpolate an array, regardless of whether or not the array has been used or declared already. The fatal error has been downgraded to an optional warning: Array @example will be interpolated in string This warns you that C<"[EMAIL PROTECTED]"> is going to turn into C if you don't backslash the C<@>. See L<http://www.plover.com/~mjd/perl/at-error.html> for more details about the history here. Mark-Jason Dominus [EMAIL PROTECTED] I am boycotting Amazon. See http://www.plover.com/~mjd/amazon.html for details.
TAI time
TAI is an international time standard. It has a number of technical advantages over UTC. One of these advantages is that it doesn't have any silly truck with leap seconds. Dan Bernstein has defined a time format called TAI64 which is based on TAI. The format is very simple. TAI64 is almost compatible with Unix epoch time. TAI64 has a resolution of one second and a range of about 300 billion years. Bernstein has a small, good-quality library of code for manipulating TAI64 values and for converting them to and from UTC or unix epoch time. The library is in the public domain, so there can't be any license or copyright objection to including it in Perl. TAI64 has one-second precision, but there are extensions to it, TAI64N and TAI64A, with nanosecond and attosecond precision. libtai handles these extensions also. libtai has functions to convert calendar dates and times (such as "March 27 1823") into TAI values and back and to input and output date and time strings. It has functions for addition, subtraction, and comparison of TAI times. The interfce is simple and well-documented. If we're going to standardize on a single time format for all platforms, I wish we could choose a good format. Unix time runs out in 2038. The libtai blurb is at: http://cr.yp.to/libtai.html I've included this below. Public-domain source code for libtai: http://cr.yp.to/libtai/libtai-0.60.tar.gz The spec for TAI64: http://cr.yp.to/libtai/tai64.html BLURB: libtai is a library for storing and manipulating dates and times. libtai supports two time scales: (1) TAI64, covering a few hundred billion years with 1-second precision; (2) TAI64NA, covering the same period with 1-attosecond precision. Both scales are defined in terms of TAI, the current international real time standard. libtai provides an internal format for TAI64, struct tai, designed for fast time manipulations. The tai_pack() and tai_unpack() routines convert between struct tai and a portable 8-byte TAI64 storage format. libtai provides similar internal and external formats for TAI64NA. libtai provides struct caldate to store dates in year-month-day form. It can convert struct caldate, under the Gregorian calendar, to a modified Julian day number for easy date arithmetic. libtai provides struct caltime to store calendar dates and times along with UTC offsets. It can convert from struct tai to struct caltime in UTC, accounting for leap seconds, for accurate date and time display. It can also convert back from struct caltime to struct tai for user input. Its overall UTC-to-TAI conversion speed is 100x better than the usual UNIX mktime() implementation. This version of libtai requires a UNIX system with gettimeofday(). It will be easy to port to other operating systems with compilers supporting 64-bit arithmetic. The libtai source code is in the public domain.
Re: TAI Time
I agree with Tim that it's a red herring that unix systems don't normally have access to a TAI source. The proposal under discussion is to use one time format for all platforms. So maybe there's a minor difficulty in converting unix time to TAI time; probably it's not as large as the difficulty in converting VMS time (for example) to whatever other platform-independent standard we were going to agree on. When you have one platform-independent standard, you necessarily accept that there are going to be conversion difficulties form the native time format to the standard format. Converting from unix time to TAI is one of the smaller such difficulties. We could hack Dan's library so that it carries the leap second table internally, and only tries to fall back to the file when the date is out of range of the internal table. I was about to start a discussion of what this would mean for calls like localtime(20) and then I realized that, as usual, I have no idea what the RFC is actually proposing. Mark-Jason Dominus [EMAIL PROTECTED] I am boycotting Amazon. See http://www.plover.com/~mjd/amazon.html for details.
Re: RFC 138 (v1) Eliminate =~ operator.
It seems to me that there are at least two important things missing from this proposal. 1. There is no substantive rationale presented for why the change would be desirable. The only reasons you put forth are: * The syntax is ugly and unintuitive. Ugliness is a matter of opinion, and I don't think it has a place here. Anyone else could simply reply, "Well, I think the =~ notation is beautiful and elegant, and your notation is ugly and clumsy," and there is no arguing with this point of view. Intuition varies from person to person. I think the proposal would be stronger if you would discuss some specific technical problems with the existing notation. Normally when we say that a notation is 'unintuitive' what we mean is that it works differently from the way people expect it to, so that they use it incorrectly. You have not provided any examples of how =~ is used incorrectly. * It performs a function that is semantically no different from other forms of argument passing. The same could be said for any operator, including +, and in fact some languages do treat + as a function whose operands are passed as arguments. For example, in Lisp, (my-function arg1 arg2 arg3) and (+ arg1 arg2 arg3) are syntactically identical. Since your argument here applies as well to +, -, ->, etc., it is not clear why your proposal is for =~ and not for +, -, ->, also. I think you should add some sections to the proposal explaining what the benefits of your proposed change would be. The other thing that I think is missing from the proposal is a discussion of precedence issues. For example, you did not say what /pat/ $x . $y ; would do. Is it equivalent to /pat/ ($x . $y) ; or to (/pat/ $x) . $y ; ? I also worry that there may be some lexical issues lurking here. Are you sure that it's never ambiguous whether a particular / will indicate the start of a pattern match or a division operator? I would like to see some discussion of this. I have several other complaints (I think you should either remove the wacky ideas, or treat them fully) but these are my main worries about the proposal.
Summary of regex-related RFCs so far
Several RFCs have been issued that relate to regexes or pattern matching but which predate the perl6-language-regex list. I have asked the librarian to transfer ownership of these RFCs to this list. In the meantime, here is a summary of the outstanding regex-related RFCs: 72 (v1): The regexp engine should go backward as well as forward. It is proposed that the regular expression engine should be designed so that, when it is in an accepting state, it will either consume the character after the end of the currently matching part of the target string, or the character just before its beginning, depending on instructions embedded in the regexp itself. 93 (v1): Regex: Support for incremental pattern matching This RFC proposes that, in addition to strings, subroutine references may be bound (with =~ or !~ or implicitly) to a regular expression. 110 (v1): counting matches Provide a simple way of giving a count of matches of a pattern. 112 (v1): Assignment within a regex Provide a simple way of naming and picking out information from a regex without having to count the brackets. 135 (v1): Require explicit m on matches, even with ?? and // as delimiters. C and C are what makes Perl hard to tokenize. Requiring them to be written C and C would solve this. Mark-Jason Dominus [EMAIL PROTECTED] I am boycotting Amazon. See http://www.plover.com/~mjd/amazon.html for details.
Re: RFC 138 (v1) Eliminate =~ operator.
> I'm not concerned about / being mistaken for division, since that > ambiguity already exists with bare /pat/ matches. Yes, but the current ambiguity is resolved from context in a rather complicated way. Nevertheless it turns out that Perl does the right thing in most cases. You are proposing to change the context, and it's not clear that the result will be the right thing as often as in the past. It may turn out that the new notation really does have exactly the same ambiguities, but that's not clear to me now. All I said was that I would like to see some discussion of it.
Re: RFC 158 (v1) Regular Expression Special Variables
> There's also long been talk/thought about making $& and $1 > and friends magic aliases into the original string, which would > save that cost. Please correct me if I'm mistaken, but I believe that that's the way they are implemented now. A regex match populates the ->startp and ->endp parts of the regex structure, and the elements of these items are byte offsets into the original string.
Re: RFC 144 (v1) Behavior of empty regex should be simple
> >I propose that this 'last successful match' behavior be discarded > >entirely, and that an empty pattern always match the empty string. > > I don't see a consideration for simply s/successful// above, which > has also been talked about. Thanks, I will add this to the next version. I did consider that, and I rejected it. Here's my thinking: s/successful// does make the feature somewhat more useful, but (a) all those uses are more easily accomplished with qr() these days, and (b) it's still an action-at-a-distance effect, which means that it's fragile and that the behavior of working code can change suddenly and surprisingly when it is modified. If you have remarks about this topic that you think are missing, please do let me know.
Re: RFC 158 (v1) Regular Expression Special Variables
> >Please correct me if I'm mistaken, but I believe that that's the way > >they are implemented now. A regex match populates the ->startp and > >->endp parts of the regex structure, and the elements of these items > >are byte offsets into the original string. > > I haven't looked at it at all, and perhaps that 's sometihng Ilya > did when creating @+ etc. So you might be right. As far as I know it's the same in 5.000. I thought the problem with $& was that the regex engine has to adjust the offsets in the startp/endp arrays every time it scans forward a character or backtracks a character. But maybe the effect of $& is greatly exaggerated or is a relic from perl4? Has anyone actually benchmarked this recently?
Re: RFC 158 (v1) Regular Expression Special Variables
> But maybe the effect of $& is greatly exaggerated or is a relic from > perl4? Has anyone actually benchmarked this recently? Matching with $& enabled is about 40% slower. http://www.plover.com/~mjd/perl/amper.pl
Re: RFC 145 (v1) Brace-matching for Perl Regular Expressions
> What exactly is matched by \g and \G is controlled by two new special > variables, @^g and @^G, which are arrays of strings. These sorts of global variables have been a problem in the past. Since they change the meaning of the \g and \G escapes, I think they should be pragmas or some other declaration that has a lexical scope. This puzzle actually pops up in your RFC: It is a run-time error to compile a regular expression that contains \g or \G while the @^g and @^G arrays do not contain the same number of elements. If it is a run-time error to compile a regex, that means that the regex compilation is occuring at run time. That is a recipe for very slow regexes. Regex compilation needs to happen at compile time except in special cases. If the declarations have lexical scope, then Perl will be able to optimize regexes that contain \g and \G. With @^G and @^g global variables, every regex that uses \g and \G will have to explicitly examine the global variables every time it wants to match \g or \G, because the values of @^g and @^G will not be known until run time and might vary. But if you have a lexically scoped declaration instead of a global variable, then Perl will be able to compile \g as if you had said [()] or whatever, and \G similarly. This will make the regex engine run faster. (As a side note, there is no such variable as $^g, so you will have to think of something else to call it. Perhaps ${^Group_Open} and ${^Group_Close}?) (Also, the \G escape already has a meaning in Perl 5, so it would probably be better to think of some other name.) > =head1 PROBLEMS > > How should a \G without a prior \g be interpreted in a regular expression? I don't think that's a big problem. One reasonable option is to make it a compile-time error: \G without preceding \g in pattern at ... So presumably Larry will be able to think of other reasonable behaviors also. The big problem I see that you didn't address is that you didn't say what would happen when the target string contains mismatched parentheses. Your example was: $string = "([b - (a + 1)] * 7)"; $string =~ /\g.*?\G/; Now here \g matches the "(" and sets up \G so that \G will only match the corresponding ")". Then .*? matches "[b - (a + 1)] * 7" and \G matches the ")". Now suppose the string were $string = "(b - a + 1] * 7)"; $string =~ /\g.*?\G/; Now what happens here? \g matches "(" and sets up \G so that \G will only match the corresponding ")". Then what? I'm not sure from your proposal. Your later example (in the 'implementation' section) suggests that '[' and ']' are ignored once \g matches a '('. If that is true, then in the example above, the .*? would match "bb - a + 1] * 7". I think this won't be what people will want from \g...\G. We will still going to get a lot of questions from people asking how to tell if the delimiters in a string are balanced. (Site note: I'm not sure why you used .*? here instead of .*, since as I understand your proposal, .* would have done the same thing. I suggest that you change .*? to .* or else add a remark about why this would be different.) Another ambiguity in your proposal: You want [\g] to match any single open delimiter character. But then later on you have an example where @^g contains the string "/*". What would [\g] do in this case? >As it continues scanning, it encounters the "]" between the "f" and the >")". The \G does not match this "]" character, because the \g must match >a ")". You mean \G here instead of \g, don't you? > sub parse > { > my $string = shift; > while ($string =~ /([^\g])*(\g)(.*?)(\G)([^\g\G]*)/g) Don't you mean ([^\g]*) instead of ([^\g])* here?
Re: RFC 110 (v3) counting matches
> Drawing on some of the proposals for extended 'for' syntax: > for my($mo, $dy, $yr) ($string =~ /(\d\d)-(\d\d)-(\d\d)/g) { > ... > } > > This still requires that you know how many () matching groups are in > the RE, of course. I don't think I would consider that onerous. If ther regex is fixed at compile time, you can simple count. But if the regex varies at run time, it's not only onerous, it's pretty near to impossible.
Re: RFC 110 (v3) counting matches
> > 1. Return the number of matches > > > > 2. Iterate over each match in sequence > > > > 3. Return list of all matches > > > > 4. Return a list of backreferences > > Please see RFC 164. It can handle all of 1-3. You seem to have missed my point. I'm not asking for a notation that can do all these four things. We have such a notation already. I'm asking for a notation that does these things *orthogonally* and *consistently*. As nearly as I can tell RFC164 doesn't address this at all. It's basically syntactic sugar for the same mess we have now. If I am mistaken, please correct me.
Re: RFC 110 (v3) counting matches
> > $count = () = $string =~ /pattern/g; > > Which I find cute as a demonstration of the Perl's context concept, > but ugly as hell from usability viewpoint. I'd really like to see an RFC that looks into making the following features more orthogonal: 1. Return the number of matches 2. Iterate over each match in sequence 3. Return list of all matches 4. Return a list of backreferences Perl presently uses various combinations of /g and scalar/list context to get these. But some useful variants are missed. For example, suppose you have a string like this: "04-23-64 02-13-62 02-01-99 05-13-18 08-10-99" You can run a loop once for each date: while ($string =~ /\d\d-\d\d-\d\d/g) { ... } You can also extract the month-day-year parts of the first date: ($mo, $dy, $yr) = ($string =~ /(\d\d)-(\d\d)-(\d\d)/); But there is no convenient way to run the loop once for each date and split the dates into pieces: # WRONG while (($mo, $dy, $yr) = ($string =~ /\d\d-\d\d-\d\d/g)) { ... } This is an infinite loop. It sets $mo $dy $yr to 04 23 64, repeatedly. One solution here is: while ($string =~ /\d\d-\d\d-\d\d/g) { ($mo, $dy, $yr) = ($& =~ /(\d\d)-(\d\d)-(\d\d)/) ... } Not only do you have to use $&, but you also have to write the pattern twice. Another solution: @matches = ($string =~ /(\d\d)-(\d\d)-(\d\d)/g); while (@matches) { ($mo, $dy, $yr) = splice @matches, 0, 3; ... } This is clumsy, and it doesn't work unless you know in advance how many backreference groups the pattern will contain. (Perl knows, and this number is part of the struct regexp, but there is no way to get Perl to tell you.) My wish list for better orthogonality is actually a little longer than the four items above, but the other items are more abstruse.
Re: RFC format
Nat Torkington writes: > Mark-Jason Dominus writes: > > RFC should have a section that addresses the feasibility of > > translating perl5 to perl6 code if the proposed change is adopted. > > This section should be required. > > I agree. > > Ziggy, want to patch the sample RFC and the RFC format document? Since you haven't had a chance to do this yet, I thought it might help if I supplied a patch. --- rfc-format.html 2000/08/29 16:33:46 1.1 +++ rfc-format.html 2000/08/29 16:35:30 @@ -44,7 +44,7 @@ Format RFCs are written in POD. rfc-sample.pod is a sample. The important sections are: TITLE, VERSION, ABSTRACT, -DESCRIPTION, IMPLEMENTATION, and REFERENCES. An optional section is STATUS. +DESCRIPTION, IMPLEMENTATION, TRANSLATION, and REFERENCES. An optional section is +STATUS. A description of each section follows: @@ -117,6 +117,16 @@ Discussion of the possible implementations. This doesn't have to be completely defined down to the char *, instead enough to show that it can be done. + + + +TRANSLATION + +Discussion of the issues involved in translating old Perl 5 code to +Perl 6 code. If a Perl 5 feature is being eliminated, can it be +emulated in Perl 6? If a feature is being changed, can the old +behavior be achieved with the new feature? Remember that it must be +possible to perform the translation automatically. --- rfc-sample.pod 2000/08/29 16:38:41 1.1 +++ rfc-sample.pod 2000/08/29 16:38:13 @@ -47,6 +47,17 @@ new model of signal handling which would make it difficult to reuse algorithms and code for systems programming from C. +=head1 TRANSLATION + +In the 'Checkpointing' scenario, Perl 5 code would run without change. + +In the 'Event Loop' scenario, Perl would be supplied with a module, +possibly Signal.pm, which provided a magical %SIG array which would +emulate the old behavior. Installing a handler into %SIG would +actually register an event handler with Perl's event loop. Using %SIG +would automatically load this module, similar to the way Error.pm is +loaded automatically when %! is used. + =head1 REFERENCES RFC 6: "Standard Event Loop"
Re: RFC 166 (does-not-match)
> This is going to need a much better definition... Yes, that was my point. I snipped the following discussion, in which you argued against a suggestion that I advanced only as an example of something that would not work. > (?^baz) should behave as (.*)(?{$1 !~ /baz/}) I don't think that's going to do it. Consider this pattern: /foo(?^baz)baz/ Here I am trying to match strings like "foobarbaz" and "foo---baz" that have a foo and a baz separated by something else that is not a baz. But with your definition, "foobazbaz" =~ /foo(?^baz)baz/ is true, when I wanted it to be false. This is because the (?^baz) matches the empty string after the 'o', and the "baz" in the pattern matches the first baz in the string, instead of the second one. > I think one should outlaw .* before or after a (?^foo) construct, as > the result is meaningless. As it stands now the whole notion is meaningless, because you have not given it a meaning. Can you provide a detailed explanation of just what is and what is not outlawed? I presume that .+ is also forbidden. What about a*, .?, .{3}, etc.? I wonder if this restriction is really necessary? > I can tighten the definition up. If there have been calls for a > (?^baz) type construct before, there will be again. It is a matter of > getting the definition straightforward and useable. Yes, I agree completely. I am looking forward to the next version of your RFC.
Re: RFC 110 (v3) counting matches
OK, I think this discussion should be closed. Richard should add a section to RFC110 that discusses the $count = () = m/PAT/g; locution and its advantages and disadvantages compared to his proposal, duly taking into account the many valuable comments that have been made. Thanks to everyone who participated in the discussion.
Proposal for IMPLEMENTATION sections
The IMPLEMENTATION section of the RFC is supposed to be mandatory, but there have been an awful lot of RFCs posted that have missing or evasive IMPLEMENTATION sections. I found more than 39% of all RFCs have a missing or incomplete implementation section. Here are the results of my survey. Of 166 total RFCs: (numbers 1-167, except #41) RFCs: 24 25 69 70 80 81 106 128 132 147 148 159 164 These 13 ( 8%) had very brief IMPLEMENTATION sections that didn't contain any substantive discussion. In these cases I judged that an implementation section would have been desirable. Some RFCs do not need implementation sections. I have enumerated these separately below. In some cases the section was actually flippant. #147 is a good example here. RFCs: 21 26 62 84 88 110 112 131 136 137 140 149 162 165 166 These 15 ( 9%) had no IMPLEMENTATION section at all. I was surprised that the librarian had even accepted these, since that section is not described as 'optional' in the RFC format document. RFCs: 97 100 These 2 ( 1%) said that implementation discussion was beyond the scope of the RFC, which I don't understand, since it clearly *is* part of the scope of the RFC. RFCs: 8 12 21 23 31 40 53 54 55 58 59 72 73 93 103 104 120 133 134 150 167 These 21 (13%) contained remarks about the author's ignorance. For example: #53: "Dammit, Jim, I'm a doctor, not an engineer!" #93: "I'll leave that to the internals guys. :-) " #40: "I've no real concrete ideas on this, sorry." RFCs: 5 6 14 30 39 45 64 75 87 89 109 113 115 160 These 14 ( 8%) contain IMPLEMENTATION sections, but do not actually discuss implementation. Instead, they contain more or less detailed discussions of the *interfaces* to the proposed new features. I recommend that a change be made to the RFC metadocuments to make the purpose of the implementation section clearer. This makes a total of 65 (39%) that have missing or bogus implementation sections. Of the remainder: RFCs: 1 2 3 4 10 11 13 17 18 19 22 27 32 35 36 37 38 42 43 44 46 47 48 49 50 51 52 56 57 60 61 63 65 66 67 71 78 79 82 83 85 86 90 92 95 96 98 99 108 111 116 117 119 121 123 124 129 130 135 138 139 142 143 145 146 151 152 153 154 155 156 157 158 161 163 These 75 (45%) appeared to contain explanations of implementation issues. In some cases the discussion seemed clearly deficient, but that's not the problem I'm trying to address in this message. I did not try to judge whether or not the discussion was cogent or to the point, but only whether a good-faith effort had been made to identify and discuss the issues. RFCs: 7 16 33 34 68 74 76 77 91 94 102 107 114 118 121 144 These 16 (10%) said something along the lines of "The implementation should be straightforward." I did not try to judge whether this was actually true. RFCs: 9 28 29 101 105 125 126 127 141 These 9 ( 5%) don't contain any substantive discussion of implementation issues, because it is not appropriate or necessary. For example, #125 is "Components in the Perl Core Should Have Well-Defined APIs and Behavior" and #28 is "Perl should stay Perl". Summary: Have implementation section:75 (45%) Should have implementation but do not: 65 (39%) "Implementation is straightforward":16 (10%) Don't need implementation section: 9 ( 5%) I don't think this is a good thing. People are proposing all sorts of stuff without thinking even a little bit about how it might be implemented. I think the proposals might be more carefully thought through if the proposers were not allowed to evade thinking about implementations. Not everyone knows enough about Perl's internal design or about programming design generally to be able to consider the issues. I suggest that these people should write to the approrpriate working group chair and ask to be put in touch with someone who can help them with the internals sections of their RFC. Then they can work out some of the details together and we might avoid some of the more obviously half-baked suggestions.
Re: Proposal for IMPLEMENTATION sections
> > These 13 ( 8%) had very brief IMPLEMENTATION sections that > > didn't contain any substantive discussion. > > > > These 21 (13%) contained remarks about the author's ignorance. > > > > These 15 ( 9%) had no IMPLEMENTATION section at all. > > The distinction between these three cases is arbitrary and trivial, > being as they are more a reflection of the authors' tastes. No, that is not true. The distinction between the first two groups is trivial. The third group is a group of RFCs that were published even though the supposedly required section was omitted. I mentioned the remarks about the authors' ignorance because it seemed to me that these were people who might have appreciated being hooked up with someone who could help make their RFCs stronger. > I wish you had applied the standard more evenly; imho, 97 & 100 had > good reasons for their cursory treatments of implementation. Sorry. I should have put in a disclaimer that I did the survey very quickly and I didn't try to be consistent. The important point of the survey was that many RFCs that should have implementation sections lack them. The details about why the section was omitted are there mostly to pander to curiosity. I'd like to amend my proposal. Suppose that the librarian *suggests* that RFC authors contact the WG chair when they submit RFCs that omit the implementation section? That way nobody is forced to do anything, and many people might be grateful for the service.
Re: RFC 165: Allow variables in a tr///
> Would there be any interest in adding these two ideas to this RFC: > > 1) tr is not regex function, so it should be regularized to > >tr(SEARCH, REPLACE, MOD, STR) MOD should be last, because you're frequently going to want to omit MOD. But I think this is worth discussing further, because it neatly accomplishes the goal of the RFC in a straightforward way: tr('a-z', 'A-Z', $str) replaces a-z with A-Z, and tr($foo, $bar, $str) replaces the characters from $foo with the characters from $bar. No special syntax is necessary. People might even stop writing things like tr/[a-z]/[A-Z]/ if we did that.
Re: RFC 165 (v1) Allow Varibles in tr///
> =head1 IMPLENTATION > > No idea, but should be straight forward. I think the reason this hasn't been done before it because it's *not* quite straightforward. The way tr/// works is that a 256-byte table is constructed at compile time that say for each input character what output character is produced. Then when it's time to apply the tr/// to a string, Perl iterates over the string one character at a time, looks up each character in the table, and replaces it with the corresponding character from the table. With tr///e, you would have to generate the table at run-time. This would suggest that you want the same sorts of optimizations that Perl applies when it encounters a regex that contains variables: 1. Perl should examine the strings to see if they have changed since the last time it executed the code 2. It should rebuild the tables only if the strings changed 3. There should be a /o modifier that promises Perl that the variables will never change. The implementation could be analogous to the way m/.../o is implemented, with two separate op nodes: One that tells Perl 'construct the tables' and one that tells Perl 'transform the string'. The 'construct the tables' node would remove itself from the op tree if it saw that the tr//o modifier was used.
Re: RFC 165: Allow variables in a tr///
> When does the structure get built? That's why eg. tr[a-z][A-Z] > brooks no variables, for it is solely at compile time that these > things occur, and why you must resort to delayed compilation via > eval qq/.../ to prod the compiler into building you a new one. Certainly. But if there were no variable interpolation for regexes, you could make the same argument about regexes. I don't see any reason why the regex solution couldn't or shouldn't be extended to tr/// also. If the pattern and replacement sets contain variables, then table construction can be deferred until run time; if there are no variables, the table is computed at compile time. Building a tr/// table is much much simpler and much less work than compiling a regex, but we don't make people write eval " \$s =~ m/$pat/ " when they want to interpolate a string into a regex at run time. Instead, we take care of it transparently. tr/// could easily be made to work the exact same way. > Maybe you want qt/.../.../ or something. I don't think a new notation is necessary in this case. All that's needed is a small extension to the existing semantics, in a direction that has already been thoroughly investigated.
Re: RFC 165: Allow variables in a tr///
> One thing to be careful of there is thread safety. You can't hand > the data off the syntax node (the one with the tr op on it), because > tr/$foo/$bar/ wouldn't work for several threads in it at the same > time then. Certainly, but that is true for everything else that is in the op node, which includes the pattern in m/.../o. One of my hopes is that the Perl 6 internals will fix this long-standing error, in which case the solution they adopt will apply to tr///e in the same way that it will to m//o and ?? and X...Y and all the rest.
Re: RFC 110 (v2) counting matches
> /t is suggested for "counT", as /c is already taken. Using /t > without /g would be result in only 0 or 1 being returned, which is > nearly the existing syntax. It occurs to me that since none of the capital letters are taken, we could adopt the convention that a capital letter as a regex modifier will introduce a *word* which continues up to the next comma. So for example: m/.../Count (instead of m/.../t) m/.../iCount (instead of m/.../it) m/.../Count,i (instead of m/.../ti) m/.../Count,Insensitive (instead of m/.../ti) That would escape the problem that we are running out of letters and also the problem that the current letters are hard to remember.
Re: RFC 110 (v3) counting matches
> On Mon, 28 Aug 2000, Mark-Jason Dominus wrote: > > > But there is no convenient way to run the loop once for each date and > > split the dates into pieces: > > > > # WRONG > > while (($mo, $dy, $yr) = ($string =~ /(\d\d)-(\d\d)-(\d\d)/g)) { > > ... > > } > > What I use in a script of mine is: > > while ($string =~ /(\d\d)-(\d\d)-(\d\d)/g) { > ($mo, $dy, $yr) = ($1, $2, $3); > } > > Although this, of course, also requires that you know the number of > backreferences. The real problem I was trying to discuss was not this particular application. I was trying to point out a larger problem, which is that there are several regex features that are enabled or disabled depending on what context the match is in, so that if you want one scalar-context feature and one list-context feature at the same time, there is no direct way to do it. > Nicer would be to be able to assign from @matchdata or something > like that :) I agree. There are many operations that would be simpler if there was a magic array that contained ($1, $2, $3, ...). If anyone wants to write an RFC on this, I will help.
Re: RFC 110 (v2) counting matches
> On Tue, 29 Aug 2000 08:47:25 -0400, Mark-Jason Dominus wrote: > > >m/.../Count,Insensitive (instead of m/.../ti) > > > >That would escape the problem that we are running out of letters and > >also the problem that the current letters are hard to remember. > > Yes, but wouldn't this give us backward compatibility problems? For > example, code like > > $result = m/(.)/Insensitive, ord $1; No, because that is presently a syntax error. The one you have to watch out for is: $result = m/(.)/s,Insensitive, ord $1; > And, I don't really see the need for the comma. > > m/.../CountInsensitive (instead of m/.../ti) I guess, but to me CountInsensitive looks like one option, not two.
Re: RFC 164 (v1) Replace =~, !~, m//, and s/// with match() and subst()
> Make your suggestions. But I think it is all off-base. None of this is > addressing some improvement in working conditions, ease of use, problems > in the language, etc. 1. I don't agree. 2. This mailing list is also for discussing stylistic improvements to the language. 3. If you think people are talking about the wrong things, then you should submit your own RFCs on the right things, instead of complaining about what other people are doing. I have not seen any RFCs from you. > MJD's // killer RFC is a headache. I would appreciate a clear discussion of why that is. That is what we are here for. If the RFC does not lay out clearly what problem it is tryhing to solve, that is a problem with the RFC and it's something we should discuss on the list. However, this comment by itself is not useful. > I don't see how this solves an already existing problem. I didn't either, and I objected to RFC138 on that basis. But Larry said: # Well, the fact is, I've been thinking about possible ways to get rid # of =~ for some time now, so I certainly don't mind brainstorming in # this direction. So I consider the metasubject (of whether we should be discussing that topic at all) to be officially closed.
RFC 166 (does-not-match)
Richard Proctor's RFC166 says: > =head2 Matching Not a pattern > > (?^pattern) matches anything that does not match the pattern. On > its own, one can use !~ etc to negatively match patterns, but to > match a pattern that has foo(anything but not baz)bar is currently > difficult. With this syntax it would simply be /foo(?^baz)bar/. The problem with this proposal is that it's really unclear what it means. The reason we don't have this feature today is not that it has never been thought of before. People have thought of this a hundred times. The problem is that nobody has ever figured out how it should work. I don't mean that the implemenation is difficult. I mean that nobody understand what such a a feature actually means. Richard doesn't say this in his RFC, even for the simple examples he raises. He just assumes that it will be obvious, but it isn't. "foo-bazbar" =~ /foo(?^baz)bar/# true or false? "foo-baz-bar" =~ /foo(?^baz)bar/# true or false? OK, I'm going to try to invent a meaning for (?^baz). I'm going to choose what appears to be a reasonable choice, and see what happens. Let's suppose that what (?^baz) means is "match any substring that is not 'baz'." That is a reasonably clear meaning. Then it behaves like (.*)(?{$1 ne 'baz'}) does today. Then the examples above are both true. Now let's see how that choice works out. "foobaz" =~ /foo.*(?^baz)/ This is TRUE, because "foo" matches "foo", ".*" matches "baz", and "(?^baz)" matches the empty string at the end, which is a substring that is not "baz". In fact, with this apparently reasonable choice of meaning for (?^baz), /foo.*(?^baz)/ will match anything that /foo.*/ will. The (?^baz) has hardly any effect at all. It is a good thing that we did not implement it that way, because it is sure to become an instant FAQ: "Why does /foo.*(?^baz)/ match 'foobaz'?" You are going to see this question in comp.lang.perl.misc every week. So this choice I made for the meaning of (?^baz) appears to have been the wrong one. I could go on and make a different reasonable-seeming choice and show what was wrong with it, but I don't want to belabor my point, which is: Every choice anyone has ever made for the meaning of (?^baz) has always been the wrong one for one reason or another. So without a detailed explanation of what (?^baz) might mean, suggesting that Perl 6 have one is not helpful.
RFC 166 (disambiguator)
Richard Proctor suggests that (?) will match the empty string. Then it can be inserted into regexes to separate elements that need to be separated. For example, /$foo(?)bar/ interpolates the value of $foo and then looks for that pattern followed by 'bar'. You cannot simply write /$foobar/ because then Perl tries to interpolate $foobar, which is not what you wanted. 1. You can already write /${foo}bar/ to get what you wanted. This solution already works inside of double-quoted strings. (?) would not work inside of double-quoted strings. 2. You can already write /$foo(?:)bar/ to get what you wanted. This is almost identical to what Richard proposed anyway. It is really not clear to me that this problem needs to be solved any better than it is already. I suggest that this section be removed from the RFC. Mark-Jason Dominus [EMAIL PROTECTED] I am boycotting Amazon. See http://www.plover.com/~mjd/amazon.html for details.
Re: RFC 110 (v3) counting matches
> >>solution to execute perl code inside a string, replacing "${\(...)}" and > > > >The first one doesn't work, and never did. You want > >@{[]} and @{[scalar ]} instead. > > "Doesn't work"? I think what Tom means is that (for example) print "${\(localtime())}\n"; does not produce "Tue Aug 29 19:15:55 2000". Anyway, this is off-topic for this mailing list, so let's put an end to this part of the discussion unless it relates somehow to regexes.
Re: RFC 110 (v3) counting matches
> On Tue, 29 Aug 2000, Mark-Jason Dominus wrote: > > > OK, I think this discussion should be closed. > > I think the bit about "having a special array containing all captured > matches" might well still live on. The "counting" bit _per se_ is probably > fairly closed, though. I didn't mean to close the discussion about counting. The only part of the discussion that I thought should be closed was the argument about whether $count = () = m/.../g; was a good idea, and the following discussion that was all about context issues and context operators and had nothing to do with regexes. Sorry that this was unclear.
Re: RFC 165 (v1) Allow Varibles in tr///
> Accepting variables in tr// makes no sense. It defeats the purpose of > tr/// - extremely fast, known transliterations. The propsal extends tr/// to handle extremely fast transliterations whose nature is not known at compile time. > > tr///e is the same as s///g: > > tr/$foo/$bar/e == s/$foo/$bar/g It is nothing of the sort. $foo = 'fo'; $bar = 'ba'; $s1 = $s2 = "foolproof"; $s1 =~ tr/$foo/$bar/e; # The result is "baalpraab"; $s2 =~ s/$foo/$bar/g; # The result is "baolproof"
Re: RFC 165 (v1) Allow Varibles in tr///
> Note that the 256-byte thing is out the window with Unicode, but that > I no longer know how it is done. Thanks. I was going to mention that, but I forgot before I sent the message. The 256-byte thing is still in place with unicode, but it's only used on byte strings, not on UTF8 strings. Since the byte/UTF8 thing might be going out the window in Perl 6, it's hard to speculate about the implications for tr///. But I think my main point still stands: We don't have any problem with reconstructing a (potentially humongous) regex structure at run time, so I don't see why we should have a problem with reconstructing the tr/// tables at run time.
Overlapping RFCs 135 138 164
RFC135: Require explicit m on matches, even with ?? and // as delimiters. C and C are what makes Perl hard to tokenize. Requiring them to be written C and C would solve this. (Nathan Torkington) RFC138: Eliminate =~ operator. Replace EXPR =~ m/.../ with m/.../ EXPR, and similarly for s/// and tr///. Force an explicit dereference when using qr/.../. Disallow the implicit treatment of a string as a regular expression to match against. (Steve Fink) RFC164: Replace =~, !~, m//, and s/// with match() and subst() Several people (including Larry) have expressed a desire to get rid of C<=~> and C. This RFC proposes a way to replace C and C with two new builtins, C and C. (Nathan Widger) I would like to see these three RFCs merged into one if this is appropriate. I am calling on the three authors to discuss in private email how this may be done. I hope that the discussion will result in the withdrawal at least two of the three RFCs, and that this private discussion produces a new RFC. The new RFC should discuss the points raised by all three existing RFCs, should investigate several solutions in parallel, and should compare them with one another and contrast the benefits and drawbacks of each one. Mark-Jason Dominus [EMAIL PROTECTED] I am boycotting Amazon. See http://www.plover.com/~mjd/amazon.html for details.
Re: RFC 110 (v2) counting matches
> Mark-Jason Dominus wrote: > > > > m/.../Count (instead of m/.../t) > > m/.../iCount (instead of m/.../it) > > m/.../Count,i (instead of m/.../ti) > > m/.../Count,Insensitive (instead of m/.../ti) > > Blech, no. Please. Less typing good. More typing bad. > > If you're just proposing synonyms, I don't see anyone using these > besides as mnemnonics. In which case, the key is just making sure > that we pick good letters. Iwas proposing synonyms for the existing options, and an expanded namespace for future options. It is perfectly reasonable for common flags to get short names and uncommon flags to get long names. For example, I think that if the /c option had had only a long name, it would have imposed very little burden on the community, and it would have left /c itself available for the more useful application of producing a count. > I don't see us running out of letters. The problem is not with running out of letters. The problem is with running out of appropriate letters. I raised this suggestion in response to Richard Proctor's observation that /c was unavailable for 'count', and suggesting /t instead. > Last I checked, m// only takes half a dozen flags. m// and s/// presently take eight different flags. (cegimosx) In the past, several others have been proposed, including /r, /t, and /z. > And so on. This seems like a much more productive use, otherwise we're > just wasting characters. Characters are not in short supply. Anyway, I will consider the subject closed unless someone produces an RFC for it.
Re: Proposal for IMPLEMENTATION sections
> Any requirements on how solid an implementation section should be > should be left to the working group chairs. Sorry, I don't understand this. What is the WGC's role here?
Re: Proposal for IMPLEMENTATION sections
> On Wed, Aug 30, 2000 at 02:29:33PM -0400, Mark-Jason Dominus wrote: > > > > > Any requirements on how solid an implementation section should be > > > should be left to the working group chairs. > > > > Sorry, I don't understand this. What is the WGC's role here? > > My english native language is? :-) I didn't have problem with the parsing. I had trouble with the meaning. Suppose a WGC establishes a requirement for the solidity of the implementation section, and receives an RFC that does not meet the requirements. What then?
Re: RFC 72 (v1) The regexp engine should go backward as well as forward.
> I am unemcumbered by any knowledge of the regex engine implementation, Yeah. But I do know something about it, and I have already expressed my informed opinion. Having you come along to say that you don't know anything about it at all, but that you nevertheless think I am mistaken, is bizarre. > It might be possible to unroll this imagined inner test outside the loop - Perhaps you could study the code in regexec.c for a little bit of time, say fifteen minutes, and then make this suggestion again in light of what you discover. I have no problem with discussing this in more detail, but I don't think it would be a good use of my time to discuss it with you when you haven't looked at the code.
Re: $& and copying: rfc 158 (was Re: RFC 110 (v3) counting matches)
> MD> One of Uri's suggestions in RFC 158 was to compute $& only for > MD> regexes that have a /k modifier. This would solve the $& problem > MD> because Perl would compute $& only when asked to, and not for > MD> every other regex in the rest of the program. > > the rfc was about making $& private to the block with the regex and > only make the copy if /k is used or you use grabbing. Making $& local to a block is not going to get a performance improvement. The reason $1 is block localized is for safety, not speed. Consider: /(...)/; foo(); print $1; You might have had to worry that foo() would reset $1 somehow. But because $1 is block-localized, you can be sure that it will be restored automatically when foo() returns. The performance gain in your RFC comes from the /k option, regardless of whether or not $& gets block scope. > a side question i have is whether this extra copy is a runtime effect or > compile time. i would imagine runtime with some global flag being > checked to see if $& is being used. so you could run fast and later load > a module uses $& which slows you down. That doesn't make any sense. Your proposal says that $& is only set for regexes that have /k. Loading a module won't change your non-/k regexes. > in any case, i think we have a fair agreement on rfc 158 and i will > freeze it if there is no further comments on it. Please add a section that addresses Perl 5 -> Perl 6 translation issues that will apply if your proposal is adopted.
Re: $& and copying: rfc 158 (was Re: RFC 110 (v3) counting matches)
> in any case, i think we have a fair agreement on rfc 158 and i will > freeze it if there is no further comments on it. In light of this: $& The string matched by the last successful pattern match (not counting any matches hidden within a BLOCK or eval() enclosed by the current BLOCK). (Mnemonic: like & in some editors.) This variable is read-only and dynamically scoped to the current BLOCK. (perlvar) I think you should remove the parts of your propsal about making $& be autolocalized. Thanks, Tom.
Re: RFC 72 (v1) The regexp engine should go backward as well as forward.
The big thing I find missing from this RFC is compelling examples. You are proposing a major change to the regex engine but you only have two examples. Both involve only fixed strings and one of them is artificial. I really think you need to discuss in more detail why this feature would be useful. You specifically said that you wanted your feature to be able to match expressions other than fixed strings, but you didn't give any examples of that. > With the proposed extension, you could write: > > m/GAAC(?r)(TTAAG| )/ > > and the regexp engine doesn't have to go looking deep into your regexp to > know where it should start potential matches. OK, now here it's not really clear why you would want to use your feature instead of doing something like this instead: while (m/GAAC/g) { last if substr($_, pos($_)-5, 5) eq 'GAATT'; last if ...; ...; } You could make an argument that yours is more compact, but my version it could easily be wrapped into a subroutine, and it doesn't seem like a particularly common operation, so it doesn't seem like there needs to be another way to say this. Of course, I might have completely missed the point. More and better examples would be a great help here. > As a frivolous illustration, the string > > ABCDEFGHIJKLM > > would be matched by: > > m/FG(?r)EDCB(?f)HIJK(?r)A^(?f)LM$/ If I understand your proposal correctly, it will not change the behavior of the regex if you collect the (?f) and (/r) sesctions together. If this is true, then these all have the same meaning: m/FG(?r)EDCB(?f)HIJK(?r)A^(?f)LM$/ # Your example m/FGHIJK(?r)EDCB(?r)A^(?f)LM$/ m/FGHIJK(?r)EDCBA^(?f)LM$/ m/FGHIJKLM$(?r)EDCBA^/ # Why not just say this? If I am correct, then it doesn't appear that there is ever any reason to have more than one (?r) and one (?f) in a single regex. Also, since there is in effect an implicit (?f) at the beginning of every regex, you don't need a (?f) escape at all, as in the example I just showed. Did I misunderstand your proposal? Or did I miss seeing the implication of some example that you didn't include? If I am correct, I think you should eliminate (?f) from your proposal, since it is not useful. > It will be important to know the offset where the match begins, as > well as where it ends (indeed it would be nice to have that info in > Perl5 without having to pay the C performance penalty), > so in addition to C, there might be a function C to > give the start of the match -- or C might return both end and > start offsets in a list context. OK, that's very nice, but you say you don't want the $& penalty. I suspect from your discussion that you don't really understand that $& penalty. There are two parts to the $& penalty. The first part is that maintaining the information for $& has a cost. Maintaining this information for your prepos() function is going to incur an identical cost. The other part of the $& penalty is because $& itself is a global variable, the penalty has to be paid by every regex in the program. This is not a problem with the information in $&; it is a problem with the interface to the information. If the interface were different, $& would not be a problem. For example, if $& were only set on regexes with a /k modifier, as proposed in RFC158, a lot of the pain of $& would go away. Now if something like RFC158 were adopted, then your rationale for prepos() would go away, because length($&) would no longer be particularly expensive. At least, there would be no reason to suppose it would be more expensive than your proposal. However, a prepos() function had exactly the same problem as $& presently has. Whenever Perl did a regex match on any regex in the entire program, it would have no way of knowing whether prepos() might be called much later, so the cost of computing and storing the prepos() information would be incurred. Rather than evading the $& problem, as you suggest, introducing prepos() is going to make it even worse. You can evade this problem by making prepos() lexically scoped. For example, prepos() information is only computed for regexes that have the /q modifier on the end, or is only available inside the scope of a 'use prepos' declaration. Either of these would fix this problem. > I have no idea whether this feature will help people parsing right-to-left > languages; it seems likely to help with bi-directional texts (see RFC 50). I was wondering that myself, but I don't think it will, because RTL text is not encoded backwards in the string itself. It only *prints* right-to-left. But I may be mistaken, and I think you should consult with Roman Parparov on this point before submitting the next revision of this RFC. Finally, some general comments: First, it seems to me that if there were simply a better interface to pos() and to length($&), the need for this feature would go away. Let's
Re: RFC 165 (v1) Allow Varibles in tr///
> > The way tr/// works is that a 256-byte table is constructed at compile > > time that say for each input character what output character is > > Speaking of which, what's going to happen when there are more than 256 > values to map? It's already happened, but I forget the details.
Re: RFC 165 (v1) Allow Varibles in tr///
> Ok, I can understand that. But, what happens when we get to UTF16? Aren't > we talking about 256k per tr///, then? That seems like a lot of memory > that is potentially wasted and could lead to some really large footprints. I don't understand what this discussion has to do with this mailing list, and I don't understand what your point is. tr/// has already been implemented. It uses a 256-byte table. tr/// has already been extended to UTF8 strings, and it takes a certain amount of memory. Perhaps that amount is 256K, perhaps not. If it is, what does that have to do with us here? If this discussion should go on anywhere, it should be on the perl6-internals list. If you want to register an opinion that 256K bytes is too expensive, you should do that on perl6-internals. It is up to them to figure out if the current implementation is wasteful of memory and to devise a new implementation if so. For the record, the UTF8 version of tr/// does not use a plain 256K table. It uses a data strcuture called a 'swash'; this is also the data structure that is used by the UTF8 versions of 'uc' etc., the \p{...} regex escapes, and the others. The swash is based on a hash, and the code is in utf8.c.
Re: RFC 110 (v3) counting matches
> (mystery: how > can filling in $& be a lot slower than filling in $1?) It isn't. It's the same. $1 might even be more expensive than $&. It appears that many people don't understand the problem with $&. I will try to explain. Maintaining the information required by $1 or $& slows down the regex match, possibly by as much as forty to sixty percent, or more. (How much depends on details of the regex and the target string.) For this reason, Perl has an optimization in it so that if you never use $& anywhere in your program, Perl never maintains the information, and every regex in your program runs faster. But if you do use $& somewhere, Perl cannot apply the optimization, and it must compute the $& information for every regex in the program. Every regex becomes much slower. In particular, if you load a module whose author happened to use $&, all your regexes get slower, which might be an unpleasant surprise, since you might not be aware of the cause. A regex with backreferences is *also* slow. But using backreferences in one regex does not make all the *other* regexes slow. If you have /(...)/ # regex 1 /.../ # regex 2 Perl knows that it must compute the backreference information for regex 1, and knows that it can skip computing the backreference information for regex 2, because regex 2 contains no parentheses. If you use a module that contains regexes that use backreferences, those regexes run slowly, but there is no effect on *your* regexes. The cost is just as high for backreferences as for $&, but the backreference cost is paid only by regexes that actually need it. The $& cost is paid by every regex in the entire program, whether they used it or not. This is because Perl has no way to tell which regexes use $& and which do not. One of Uri's suggestions in RFC 158 was to compute $& only for regexes that have a /k modifier. This would solve the $& problem because Perl would compute $& only when asked to, and not for every other regex in the rest of the program.
perl6-language-regex summary for 20000831
ry neatly. Nathan Wiger pointed out that this was covered by RFC 164. I pointed out that the implementation would have construct the translation table at run-time, and that this brings in the same issues as when a regex is constructed at run time. For example, a new tr///o option becomes desirable for the same reaosn the m//o is desirable. Tom and I had a discussion of these issues, but there do not appear to be any issues here that do not also come up in connection with interpolated regexes. There was a sidetrack about the implementation of tr/// in the presence of Unicode strings. RFC 166: Additions to regexs (Richard Proctor) This RFC unfortunately proposes three totally unrelated changes. Richard proposed a 'does not match' operator, with the example that /a(?^b)c/ would match ac, axc, a---c, but not abc, a-b-c, or abbbc. But did not include a complete enough explanation of what it would do to enable anyone to implement it. (Nobody has been able to produce a sensible description of such an operator, which is probably why Perl doesn't have one yet.) Richard said he would tighten up the definition, but version 2 has not appeared yet. Richard also proposed a (?) operator that would match the empty string. You would use this in cases like /$foo(?)bar/ where it is inappropriate to abut $foo and bar. It was pointed out that /${foo}bar/ and /$foo(?:)bar/ already work for this purpose. Richard agreed that this was what he wanted. The third proposal was that (?@foo) be taken to interpolate the string (join "|", @foo). There was no discussion of this. RFC 170: Generalize =~ to a special-purpose assignment operator (Nathan Wiger) This is probably the most interesting and far-reaching RFC proposed this week, but there was essentially no discussion. Mark-Jason Dominus [EMAIL PROTECTED] I am boycotting Amazon. See http://www.plover.com/~mjd/amazon.html for details.
Re: perl6-language-regex summary for 20000831
> On Thu, Aug 31, 2000 at 12:34:05PM -0400, Mark-Jason Dominus wrote: > > > > perl6-language-regex > > > > Summary report 2831 > > > > RFC 72: The regexp engine should go backward as well as > > forward. (Peter Heslin) > > > > This topic did not attract much discussion until the very end of the > > week. I sent the author a detailed critique, to which he has not > > responded. > > As the author in question, I would like to note in my defense that there I wasn't trying to attack you, so no defense is required. I was just reporting on the current status of the RFC. > I posted a detailed response (within 24 hours) and have now posted a > revised RFC. You did, and it was excellent. Thanks very much.
Re: XML/HTML-specific ?< and ?> operators? (was Re: RFC 145 (alternate approach))
> >>>>> "Mark-Jason" == Mark-Jason Dominus <[EMAIL PROTECTED]> writes: > > Mark-Jason> I have some ideas about how to do this, and I will try to > Mark-Jason> write up an RFC this week. > > "You want Icon, you know where to find it..." :) That's exactly my motivation. It seems to me that trying to cram Icon into regexes isn't working well, but that a small transplant of Icon into the core language might suffice instead of a lot of cramming.
Re: XML/HTML-specific ?< and ?> operators? (was Re: RFC 145 (alternate approach))
> I think what is needed is something along the line of : Joe McMahon and I are working on something along these lines.
Re: What's in a Regex (was RFC 145)
> > 2. Many people - including Larry - have voiced their desire > > to see =~ die a horrible death > > Please provide a look-up-able reference to Larry's saying that he > wanted to =~ to die horrible death. Larry said: # Well, the fact is, I've been thinking about possible ways to get rid # of =~ for some time now, so I certainly don't mind brainstorming in # this direction. That is in <[EMAIL PROTECTED]> which is archived at http://www.mail-archive.com/perl6-language-regex@perl.org/msg3.html I think Nathan was exaggerating here, but maybe he knows something I don't.
Re: XML/HTML-specific ?< and ?> operators? (was Re: RFC 145 (alternate approach))
> >...My point is that I think we're approaching this > >the wrong way. We're trying to apply more and more parser power into what > >classically has been the lexer / tokenizer, namely our beloved > >regular-expression engine. I've been thinking the same thing. It seems to me that the attempts to shoehorn parsers into regex syntax have either been unsuccessful (yielding an underpowered extension) or illegible or both. An approach that appears to have been more successful is to find ways to integrate regexes *into* parser code more effectively. Damian Conway's Parse::RecDescent module does this, and so does SNOBOL. In SNOBOL, if you want to write a pattern that matches balanced parenteses, it's easy and straightforward and legible: parenstring = '(' *parenstring ')' | *parenstring *parenstring | span('()') (span('()') is like [^()]* in Perl.) The solution in Parse::RecDescent is similar. Compare this with the solutions that work now: # man page solution $re = qr{ \( (?: (?> [^()]+ )# Non-parens without backtracking | (??{ $re }) # Group with matching parens )* \) }x; This is not exactly the same, but I tried a direct translation: $re = qr{ \( (??{$re}) \) | (??{$re}) (??{$re}) | (?> [^()]+) }x; and it looks worse and dumps core. This works: qr{ ^ (?{ local $d=0 }) (?: \( (?{$d++}) | \) (?{$d--}) (? (?{$d<0}) (?!) ) | (?> [^()]* ) )* (? (?{$d!=0}) (?!) ) $ }x; but it's rather difficult to take seriously. The solution proposed in the recent RFC 145: /([^\m]*)(\m)(.*?)(\M)([^\m\M]*)/g is not a lot better. David Corbin's alternative looks about the same. On a different topic from the same barrel, we just got a proposal that ([23,39]) should match only numbers between 23 and 39. It seems to me that rather than trying to shoehorn one special-purpose syntax after another into the regex language, which is already overloaded, that it would be better to try to integrate regex matching better with Perl itself. Then you could use regular Perl code to control things like numeric ranges. Note that at present, you can get the effect of [(23,39)] by writing this: (\d+)(?(?{$1 < 23 || $1 > 39})(?!)) which isn't pleasant to look at, but I think it points in the right direction, because it is a lot more flexible than [(23,39)]. If you need to fix it to match 23.2 but not 39.5, it is straightforward to do that: (\d+(\.\d*)?)(?(?{$1 < 23 || $1 > 39})(?!)) The [(23,39)] notation, however, is doomed.All you can do is propose Yet Another Extension for Perl 7. The big problem with (\d+)(?(?{$1 < 23 || $1 > 39})(?!)) is that it is hard to read and understand. The real problem here is that regexes are single strings. When you try to compress a programming language into a single string this way, you end up with something that looks like Befunge or TECO. We are going in the same direction here. Suppose there were an alternative syntax for regexes that did *not* require that everything be compressed into a single string? Rather than trying to pack all of Perl into the regex syntax, bit by bit, using ever longer and more bizarre punctuation sequences, I think a better solution would be to try to expose the parts of the regex engine that we are trying to control. I have some ideas about how to do this, and I will try to write up an RFC this week.
perl6-language-regex summary for 20000911
perl6-language-regex Summary report 2911 RFC 72: The regexp engine should go backward as well as forward. (Peter Heslin) The author sent revised version of the RFC. There seem to be two ideas here: 1. The lookbehind assertions should work for variable-length patterns. (At present they match only fixed-length strings.) 2. The programmer would be able to optimize the regex match by directing the engine to an unlikely part of the pattern first. For example, if you are looking for 'e.*z', and you write /eX*z/ the regex engine might look first for an e, then some following X's, in hopes of finding a 'z afterwards. If there are many e's and few z's, this may result in many false starts. Peter's idea seems to be that with /z(?r)X*e/ the regex engine would look first for a 'z', and then for *preceding* 'X*', and then for an 'e' before that, which might be faster. Bart Lateur said that the regex engine should do this sort of optimization automatically. (In fact it already does in some cases.) Hugo points out that a variable-length lookbehind might be more powerful. RFC 93: Regex: Support for incremental pattern matching (Damian Conway) No discussion this week. RFC 110: counting matches (Richard Proctor) Richard released version 4 of the RFC, which just adds a couple of personal opinions about how $number = () = m/.../g; is ugly. He says that he is going to add some suggestions from Hugo van der Sanden and then freeze the RFC at the end of the week. RFC 112: Assignment within a regex (Richard Proctor) No discussion. RFC 138: Eliminate =~ operator. (Steve Fink) Steve withdrew this RFC in favor of RFC 164. RFC 144: Behavior of empty regex should be simple (Mark Dominus) Frozen. RFC 145: Brace-matching for Perl Regular Expressions (Eric Roode) David Corbin suggested an alternative syntax. This sparked a long series of syntactic suggestions. Nathan Wiger suggested a special syntax for matching XML-style open and close tags. (However, he did not submit an RFC.) I *still* think that all the proposals for this functionality are too limited and too specific. Others seem to be thinking in the same direction. Jonathan Scott Duff asked What if we just provided deep enough hooks into the RE engine that specialized parsing constructs like these could easily be added by those who need them? Michael Maraist said that more powerful and convenient parsin should be incorporated into Perl, not into the regex engine. Tom Christiansen expressed agreement. Damian Conway suggested that people look at Parse::RecDescent and suggested that this and other parsing modules were the right direction to go. I sent a note about SNOBOL syntaxes, and promosed that Joe McMahon and I would an RFC, but it has not appeared yet. RFC 150: Extend regex syntax to provide for return of a hash of matched subpatterns (Kevin Walker) Kevin reported back on Python's version of this feature. He said that the major deficiency in his proporsl is that there is nothing analogous to the \1 in /(.*)\1/. He promised a revised RFC, which has not appeared yet. RFC 158: Regular Expression Special Variables (Uri Guttman) It was pointed out that part of Uri's proposal was to make $& block scoped, but $& is already block scoped. Uri has not sent a revised RFC. RFC 164: Replace =~, !~, m//, s///, and tr// with match(), subst(), and trade() (Nathan Wiger) Surprisingly, there was no discussion about this RFC this week. RFC 165: Allow variables in tr/// (Richard Proctor) Richard suggested that he freeze the RFC, but I don't believbe he has sufficiently taken into account the last round of discussion. I have asked for a revision. RFC 166: Additions to regexs (Richard Proctor) Richard plans to drop two of the three items here, and retain only the one that makes (?@foo) equivalent to (??(join '|', @foo)) I pointed out that this is already possible with a compile-time overloaded constant, and provided a demonstration module. RFC 170: Generalize =~ to a special-purpose assignment operator (Nathan Wiger) Still little discussion of this. RFC 197: Numberic Value Ranges In Regular Expressions (David Nichol) There was no discussion. RFC 198: Boolean Regexes (Richard Proctor) Richard says that this is a development of the 'negated expression' idea that he dropped from RFC 166. Discussion from RFC 166 pointed out that Richard had not said clearly what he wanted his (?^...) proposal to do. Richard proposed a definition, and I followed up pointing out that even his proposed definition did not do what he wanted. Richard then said he would tighten up the definition, but it appears that he didn't do this. However, there has been no discussion of this proposal.
Re: RFC 197 (v1) Numberic Value Ranges In Regular Expressions
I have some trouble understanding just what the proposal is, since the RFC doesn't contain any examples. But I gather that you want to usurp *both* the (...) and the [...] notation for numeric ranges. This would change the meaning of any code that happened to contain a regex like this: /(12.3,45.67)/ That seems to me like a very bad idea. Usurping /[...]/ isn't quite as awful an idea, since patterns like /[12,34]/ are probably rare. The behavior you want is already possible without an extension: /(\d+\.?\d*) # look for a number (? (?{$1 < 37.3 || $1 > 200}) # If it's out of range (?!) # ...then backtrack ) /x I agree that this isn't really pretty, but 1. the proposed notation is really nasty, since it overloads existing well-established notations, and 2. I think a better response would be to find a way to use the existing features with a prettier notation, since they are much more generally applicable than your proposed extension.
Re: RFC 72 (v1) The regexp engine should go backward as well as forward.
> Simply put, I want variable-length lookbehind. Why didn't you simply propose that the (?<...) operator be fixed to support variable-length expressions? Why so much additional machinery?
Re: RFC 72 (v1) The regexp engine should go backward as well as forward.
> As to your contention that "at best" (?r) will defeat many present > optimizations, can you tell me why this will necessarily be so in the > new engine? Let me explain my thinking along these lines. I've made a number of assumptions, which may not be correct, and certainly aren't obvious. I have been supposing all along that the Perl 6 regex engine will incorporate the Perl 5 regex engine directly. This may turn out to be wrong, but I did think it through. I think this for several reasons: 1. Writing even a simple regex engine is nontrivial. Writing a regex engine as fast and as complicated as Perl's a very difficult. Even Perl's regex engine was not written from scratch; it was based on code supplied by Henry Spencer. 2. Very few people are available who are capable of reimplementing Perl's regex engine. The people on this list are clearly not going to do it. According to someone on this list, some of the people here are not even competent to look at the regex engine code. More to the point, I don't know of anyone who has volunteered, and when I try to think of candidates, nobody likely comes to mind. 3. Regexes are one of Perl's most essential features. If the regexes are slow, that is a big problem for Perl. The existing regex engine is fast, partly because it has years of optimizations in it. To start over would be to throw that all away. 4. People have tried implementing regex engines along different principles before, and have not been able to find anything faster than the current strategy. For example, in Perl regexes are compiled into fixed-size bytecodes; when a regex (such as /a(b|c)d/) contains a branch, the branch is expressed as a bytecode offset. It might seem that one could do better: Instead of using bixed-size bytecodes, compile each regex operator as a C structure with a pointer to the struct for the next opcode. A branch operator will have pointers to two other structures, instead of to only one. People have tried this more than once. It turns out that this is slower than the bytecode approach. 5. Larry has already said that he expects that much of the initial Perl 6 code will actually be Perl 5 code, just as much of the initial Perlk 5 code was actually Perl 4 code. (See http://www.mail-archive.com/perl6-language@perl.org/msg01194.html) Perhaps the Perl 6 engine will be a fresh reimplementation, but I do not think that that is very likely, because there is no good reason to do it and because it does not appear that there is anyone available and qualified who wants to do it. Even if the Perl 6 engine *is* a fresh reimplementation, it seems likely that it will operate on the same principles as the Perl 5 engine. So I have been supposing that the Perl 6 regex engine will probably not be rewritten from scratch, and if it *is* rewritten from scratch, it will probably still look a lot like the Perl 5 regex engine. As I said, this might be mistaken, but I think that it's the way to bet.
Re: RFC 165: Allow Variables in tr/// (post hugo)
> I propose adding the first para as a note and moving RFC to frozen soon. You did not address my points about tr///o and related issues. I suggest that you submit a revised RFC and then freeze it a week afterwards if there is still no discussion.
Re: RFC 166 (v1) Additions to regexs
> (?@foo) is sort of equivalent to (??{join('|',@foo)}), ie it expands into a > list of alternatives. One could possible use just @foo, for this. It just occurs to me that this is already possible. I've written a module, 'atq', such that if you write use atq; then your regexes may contain the sequence (?\@foo) with the meaning that you asked for. (The \ is necessary here because (?@foo) already has a meaning under Perl 5, and I think your proposal must address this.) Anyway, since this is possible under Perl 5 with a fairly simple module, I wonder if it really needs to be in the Perl 6 core. Perhaps it would be better to propose that the module be added to the Perl 6 standard library? Module is at http://www.plover.com/~mjd/perl/atq.tgz
Re: $& and copying: rfc 158 (was Re: RFC 110 (v3) counting matches)
> > in any case, i think we have a fair agreement on rfc 158 and i will > > freeze it if there is no further comments on it. > > I think you should remove the parts of your propsal about making $& be > autolocalized. If you're not planning to revise your RFC, let me know so that I can ask the librarian to mark it as withdrawn.
Re: RFC 110 counting matches (post Hugo)
> I propose adding this note. His preference for the working of > /t and /g seems the most appropriate. Unless I here any further > discussion I propose moving this RFC to frozen this week. Please post a complete, revised version of the RFC *before* you freeze it.
Re: RFC 166 (v1) Additions to regexs
> > (The \ is necessary here because (?@foo) already has a meaning under > > Perl 5, and I think your proposal must address this.) > > (?@foo) has no meaning I checked the code I don't know what you mean, but you're mistaken, because it means to interpolate @foo as in a double-quoted string.
Re: RFC - Prototype RFC Implementations - Seperating the men from the boys.
> Bad reasons > I do not have time. > I do not have the tuits. I think it would be a step in the right direction if the WG chairs actually required RFC authors to maintain their RFCs.
What good are WG chairs?
> I think it would be a step in the right direction if the WG chairs > actually required RFC authors to maintain their RFCs. I also think it would be a step in the right direction if the WG chairs wrote up summaries like they said they would. They obviously don't. Frankly, I don't really see what the WG chairs are for, unless maybe it's to play list mom.
Re: RFC 166 (v1) Additions to regexs
> On Tue, 12 Sep 2000 19:01:35 -0400, Mark-Jason Dominus wrote: > > >I don't know what you mean, but you're mistaken, because it means to > >interpolate @foo as in a double-quoted string. > > Which is precisely the meaning he wants for it, with $" set to '|'. "Which is precisely the meaning he wants for it, except for the parts that are different." So it presently has a different meaning. So he should say so in the section of RFC about mirgation issues. Why is this so complicated?
Re: XML/HTML-specific ?< and ?> operators?
> :Anyway, Snobol has a nice heuristic to prevent infinite recursion in > :cases like this, but I'm not sure it's applicable to the way the Perl > :regex engine works. I will think about it. > > It is probably worth adding the heuristic above: anytime you recurse > into the same re at the same position, there is an infinite loop. That is basically it, except that in snobol it is inside out: Each recursively interpolated pattern is assumed to match a string of at least length 1, and if the remaining part of the target string isn't sufficiently long to match the rest of the pattern after recursion, then the recursion is skipped.
Re: XML/HTML-specific ?< and ?> operators?
> : it looks worse and dumps core. > > That's because the first non-paren forces it to recurse into the > second branch until you hit REG_INFTY or overflow the stack. Swap > second and third branches and you have a better chance: I think something else goes wrong there too. > $re = qr{...} > (I haven't checked that there aren't other problems with it, though.) Try this: "(x)(y)" -~ /^$re$/; This should match, but it dumps core. I don't think there is infinite recursion, although I might be mistaken. Anyway, Snobol has a nice heuristic to prevent infinite recursion in cases like this, but I'm not sure it's applicable to the way the Perl regex engine works. I will think about it.
Re: what (?x) are in use? (was RFC 145 (alternate approach))
> In theory, all letters should be reserved to map to future flags for > the same reason. My recollection is that Larry specifically mandated this, and that's why (?p...) was changed to (??...) in 5.6.0.
Re: Conversion of undef() to string user overridable for easy debugging
> This reminds me of a related but rather opposite desire I have had > more than once: a quotish context that would be otherwise like q() but > with some minimal extra typing I could mark a scalar or an array to be > expanded as in qq(). I have wanted that also, although I don't remember why just now. (I think have some notes somewhere about it.) I will RFC it if you want. Note that there's prior art here: It's like Lisp's backquote operator.
Re: RFC 208 (v2) crypt() default salt
> =head1 TITLE > > crypt() default salt > > =head1 VERSION > > Maintainer: Mark Dominus <[EMAIL PROTECTED]> > Date: 11 Sep 2000 > Last Modified: 13 Sep 2000 > Mailing List: [EMAIL PROTECTED] > Number: 208 > Version: 2 > Status: Developing If there are no objections, I will freeze this in twenty-four hours.
Re: types that fail to suck
> You talked about Good Typing at YAPC, but I missed it. There's a > discussion of typing on perl6-language. Do you have notes or a > redux of your talk available to inform this debate? http://www.plover.com/~mjd/perl/yak/typing/TABLE_OF_CONTENTS.html http://www.plover.com/~mjd/perl/yak/typing/typing.html Executive summary of the talk: 1. Type checking in C and Pascal sucks. 2. Just because static type checking is a failure in C and Pascal doesn't mean you have to give up on the idea. 3. Languages like ML have powerful compile-time type checking that is successful beyond the wildest imaginings of people who suffered from Pascal. 4. It is probably impossible to get static, ML-like type checking into Perl without altering it beyond recognition. 5. However, Perl does have some type checking mechanisms, and more are coming up. Maybe I should also mention that last week I had a dream in which I had a brilliant idea for adding strong compile-time type checking to Perl, but when I woke up I realized it wasn't going to work.
Re: RFC 166 (v2) Alternative lists and quoting of things
> (?Q$foo) Quotes the contents of the scalar $foo - equivalent to > (??{ quotemeta $foo }). How is this different from \Q$foo\E ?
Re: 'eval' odd thought
> eval should stay eval. Yes, and this is the way to do that. When you translate a script, the translator should translate things so that they have the same meanings as they did before. If it doesn't also translate eval, then your Perl 5 scripts will be using the Perl 6 eval, which isn't what you wanted.
Re: RFC 308 (v1) Ban Perl hooks into regexes
> On Mon, Sep 25, 2000 at 08:56:47PM +0000, Mark-Jason Dominus wrote: > > I think the proposal that Joe McMahon and I are finishing up now will > > make these obsolete anyway. > > Good! The less I have to maintain the better... Sorry, I meant that it would make (??...) and (?{...}) obsolete, not that it will make your RFC obsolete. Our proposal is agnostic about whether (??...) and (?{...}) should be eliminated.
Re: RFC 308 (v1) Ban Perl hooks into regexes
I think the proposal that Joe McMahon and I are finishing up now will make these obsolete anyway.