RE: Regex and Matched Delimiters
> He then went on to describe something I didn't understand at all. > Sorry. Few corrections to what you wrote: To avoid the problem of extending {} to support new features with a character 'x', without breaking stuff that might have an 'x' immediately after the '{', my proposal is to require one space after the { before the real regex appears. So to correct the example I wrote of /{a|b|c}+/, it would become /{ a|b|c}+/. It looks a bit weird if you're accustomed to perl5's behavior of (?:). { \ } would then match a single space. { } would do nothing, since the second space falls under the whitespace-insensitive regex rule. Now, since we require a space, all the characters before this space now become 'special' in some form. This fact allows us to add new special characters and map them to functionality, if perl doesn't already do that. For example, I would register | to be: sub zerowidth ($regex) { return <<"EOF"; push \@pos, pos(); regex_run $( qr/$regex/ ); pos() = pop \@pos; EOF } And conversely, _ would be written as: sub regularwidth ($r) { return "regex_run $( qr/$r/ )" } This would allow me to do whacky things, like register these: sub plus ($r) {return "\$level++;regex_run $( qr/$r/ )"} sub minus($r) {return "\$level--;check();regex_run $( qr/$r/ )"} sub check {assert($level>0)} { {+ \(} | {- \)} | . } ({ check() }) brent and I also disagreed on the use of sexegers. japhy has done more thinking about this than either of us have, so perhaps we should just let him weigh in on the issue. I proposed that {< be a sexeger, whereas he prefers {< be a lookbehind. I'll use the former for the rest of this discussion, since on IRC we hd to agree to disagree on it. Regardless, having support for sexegers supports all of the behavior of lookbehinds, since lookbehinds are just a constant-string, and could never be a regex in Perl5. I still like the way lookbehinds work, and am not suggesting that they disappear entirely, but rather that they be changed into an underlying sexeger form. sub b ($reg) { my $ger = reverse $reg; return "run_regex qr/{<|= \Q$ger\E}/" } The following perl5 regex: /(?<=foo)bar/ is now equivalent to: /(b foo)bar/ > The only major drawback I can see to that is the naïve user might type > {.*?}+ expecting a bunch of text in bold tags and getting a Sorry I forgot to make that clearer. The above regex would have to be written as { .*}+ to work properly, specifiying that there are no special tokens. > Here's how it works: > -If the code returns undef, we backtrack. > -If the code returns the empty string, we move on. > -If the code returns anything else, we interpolate that into the > regex. > > So, we now just have ({}). ({print "hello"}) will unfortunately, be really weird. Since it returns 1, the block will return 1. We'd have to force-specify a return value of "". While simplifying the set of operators is good, and I want do a bunch of that myself, we should probably offer a way to perform 'execute with no interpolated regex' behavior of before, somehow built up on top of the existing ({}) operator. Reflecting on it all a bit, if we're willing to make a larger sacrifice in backwards compatibility, it might make things make more sense. - {} would be the code operator, which was specified up above as ({}). This makes more sense, imo, since {} is traditionally used for blocks. - () would have all the special semantics described for {} in this thread. The default for () could still be capturing, so ( a*) performs capturing on /a*/. We'd then have to define another pair of symbols for turning capturing on and off. All instances of Perl5's (blah) would convert to ( blah), and all instances of the special operators in perl5 a la (?@#blah) would translate as they did before, but also specifying the 'dont capture within these parens' special identifier. Basically, I'm trying to propose a system which makes all the regex stuff become orthogonal. Rather than creating a bunch of hardcoded types of (?>= regex operators, instead define small functionalities which can be combined in ways to emulate these tried and true constructs. Brent, let me know if I'm still spouting gibberish on this email. :) Mike Lambert
Re: Roadmap for Parrot
According to [EMAIL PROTECTED] (Dan Sugalski): >At the moment, we don't have to support cascading lexical >scratchpads--since we know at compile time which variables we're >accessing and where they come from, we can install trampoline entries >in the current scope's scratchpad and not have to search outward at >runtime. Once you start down that path, forever will it dominate your closure implementation. I suggest you do lexical scopes right from the start. Surely they're not that much harder than a trampoline Are they? PS: I'm back. }:-) -- Chip Salzenberg - a.k.a. -<[EMAIL PROTECTED]> "It furthers one to have somewhere to go."
Please rename 'but' to 'has'.
Larry, Please don't use 'but' to associate runtime properties to things. Please call it 'has'. First, but is just strange. I have a thing and I want to tell you it is red, so I say 'but'. Huh? Using 'has' makes a nice parallel with 'is' for compile time properties: What you are is determinted at compile time, what you have is determined at run time. Daniel
Re: Call stack manipulation?
G'day all. On Fri, Apr 19, 2002 at 09:55:46AM +0100, Piers Cawley wrote: > It's fine for partial continuations certainly, but less fine if you > want to implement full continations which require you to save the > state of the entire stack. I was hoping I'd find a way to do this > without having to wait for Parrot to get its own continuations. I suspect that using relative offsets for jump instructions might have been a bad idea in retrospect. Sure it's faster, but now we have prederef, which could turn absolute offsets into relative offsets at run time. > If > > set I0, LABEL > > sets I0 to an address which I can C to from anywhere then I have > what I need. One possible solution is to introduce a new op: inline op codeoffset(inconst INT, out INT) The idea is to turn a relative offset into an absolute address which can then be passed to "goto ADDRESS()". Another possibility is: inline op adjustoffset(in INT, inconst INT, inconst INT, out INT) { $4 = $1 + $2 - $3; goto NEXT(); } The idea is that this: adjustoffset I0, LABEL1, LABEL2, I1 will adjust an offset I0 (which is relative to LABEL1) to make it relative to LABEL2. That way, you can canonicalise all your offsets and just re-adjust them before jumping. (Note: Yes, this could be done with two existing ops. The benefit of making it another op is that a JIT compiler can easily compile it away if it chooses to use only absolute offsets.) > What's SECD by the way? http://www.wikipedia.com/wiki/SECD+machine If you have a well-stocked university library handy: @book{Henderson80, author = {Peter Henderson}, title = {{Functional Programming -- Application and Implementation}}, publisher = {Prentice/Hall International}, year= 1980, series = {Series in Computer Science}, topics = {FP - General,FP - Implementation} } Cheers, Andrew Bromage
Re: [PATCH] Op metadata
G'day all. On Fri, Apr 19, 2002 at 05:28:12PM -0300, Daniel Grunblatt wrote: > Add me to the list, because I'm writting the jit optimizer and ran into > the same problem, we must add some metadata otherwise I will end up > hard-coding all the information deep into the optimizer and that is a Bad > Thing (tm). I don't know if this is the best way to solve this but I > rather like it. I'm leaning towards invoking the politician's syllogism: We must do something. This is something. Therefore we must do this. We can always delete it later if (read: when) we think of something better. Cheers, Andrew Bromage
Re: Regex and Matched Delimiters
Let me see if I understand the final version of your (Mike's) suggestions and where it appears to be headed: Backwards compatibility: perl5 extended syntax still works in perl6 if one happens to use it. Forward conversion: Automatic conversion of relevant perl5 regex syntax to perl6 is simple. New extension syntax: 1. Syntax is (ops data). 2. There are a bunch of built-in ops, but user can define new ones. [2c. What about ( data) or (ops data) normally means non-capturing, ($2 data) captures into $2, ($foo data) captures into $foo?] Rationalized ops syntax: Ops string consists of arbitrarily ordered individual op characters. (eg '<' signifies a look behind, '!<' signifies fail if look behind match.) Embedded code: Code is inserted using {} with something other than digits in them. (Other stuff, such as sexegers, ignored.) -- ralph
Re: Please rename 'but' to 'has'.
I agree 'but' seems a tad odd, and I like the elegance of your suggestion at first sight. However... > First, but is just strange. I have a thing and I want to tell you it is > red, so I say 'but'. Huh? banana but red; "foo" but false; According to Larry, run time properties will most often be used to contradict a built-in or compile time property. If he is right about the dominant case being a contradiction, 'but' works better for me than anything else I can think of, including 'now' (explained below). - Even if usage to contradict a built-in or compile time property is not the most common form of usage, it is still arguably the case that if one keyword is to cover both cases (contradict or not), then having the keyword "warn" that contradiction may have occured is better than having the keyword indicate to a newbie that there is nothing to worry about, as would be the case with 'has'. Further, even if the "warn" notion is deemed unimportant, 'has' is still far from an ideal fit in many cases: banana has red; "foo" has false; Yet another issue is use of 'is' in a conditional: if ($foo is red) ... This would be nice, and would work nicely if one uses a different keyword for runtime properties, but works best if that other word is more consistent with the notion of 'is' than 'has' is. One plausible middle ground word off the top of my head that is odd in its own special way would be 'now': banana now red; "foo" now false; banana now foo; banana now tainted; I read 'now' as somewhat suggestive of changing something. -- ralph
Re: Regex and Matched Delimiters
> [2c. What about ( data) or (ops data) normally means non-capturing, > ($2 data) captures into $2, ($foo data) captures into $foo?] which is cool where being explicit simplifies things, but ain't where implicit is simpler. So, maybe add an op ('$'?) or switch that makes parens capturing by default, ie as per perl5. -- ralph
Re: Please rename 'but' to 'has'.
On 4/20/02 3:02 PM, "Me" <[EMAIL PROTECTED]> claimed: > banana now red; > "foo" now false; > banana now foo; > banana now tainted; > > I read 'now' as somewhat suggestive of changing something. I actually rather like this keyword. It not only suggests that something has changed, but that it has changed at a particular time -- runtime. Compile time properties just *are* (is), no matter what, unless and until you say, at runtime, that it is *now* something else. -- David Wheeler AIM: dwTheory [EMAIL PROTECTED] ICQ: 15726394 http://david.wheeler.net/ Yahoo!: dew7e Jabber: [EMAIL PROTECTED]
[PATCH] Fix another GC segfault
I was playing around a bit with the set_keyed and get_keyed ops and found that this: new P0, PerlArray set I0, 0 LOOP: set_keyed P0, I0, I0 inc I0 lt I0, 1, LOOP end causes Parrot to segfault. The culprit appears to be this bit of code in trace_active_PMCs in resources.c: else { /* The only thing left is "buffer of PMCs" */ Buffer *trace_buf = current->data; PMC **cur_pmc = trace_buf->bufstart; /* Mark the damn buffer as used! */ trace_buf->flags |= BUFFER_live_FLAG; for (i = 0; i < trace_buf->buflen; i++) { if (cur_pmc[i]) { last = mark_used(cur_pmc[i], last); } } } The problem is that trace_buf->buflen is the size of the buffer, and not the number of PMCs contained in it, so the loop reads out of the end of cur_pmc and into garbage data. The patch below fixes this, and also adds a test-case to perlarray.t to stop it from coming back. Simon --- resources.c.old Sat Apr 20 17:55:28 2002 +++ resources.c Sat Apr 20 17:58:45 2002 @@ -462,9 +462,10 @@ /* The only thing left is "buffer of PMCs" */ Buffer *trace_buf = current->data; PMC **cur_pmc = trace_buf->bufstart; +UINTVAL no_of_pmcs = trace_buf->buflen / sizeof(PMC*); /* Mark the damn buffer as used! */ trace_buf->flags |= BUFFER_live_FLAG; -for (i = 0; i < trace_buf->buflen; i++) { +for (i = 0; i < no_of_pmcs; i++) { if (cur_pmc[i]) { last = mark_used(cur_pmc[i], last); } --- t/pmc/perlarray.t.old Sat Apr 20 18:09:16 2002 +++ t/pmc/perlarray.t Sat Apr 20 18:13:48 2002 @@ -1,6 +1,6 @@ #! perl -w -use Parrot::Test tests => 6; +use Parrot::Test tests => 7; use Test::More; output_is(<<'CODE', <<'OUTPUT', "size of the array"); @@ -275,4 +275,18 @@ ok 19 OUTPUT +output_is(<<'CODE', <<'OUTPUT', "Array resizing stress-test"); +new P0, PerlArray +set I0, 0 +LOOP: set_keyed P0, I0, I0 # set P0[I0], I0 +inc I0 +lt I0, 1, LOOP +get_keyed I1, P0, # set I1, P0[] +print I1 +print "\n" +end +CODE + +OUTPUT + 1;
Re: [PATCH] Fix another GC segfault
> The problem is that trace_buf->buflen is the size of the buffer, and > not the number of PMCs contained in it, so the loop reads out of the > end of cur_pmc and into garbage data. The patch below fixes this, and > also adds a test-case to perlarray.t to stop it from coming back. I thought this description sounded familiar...I included a note about it when I was working on the parrot_reallocate_buffer patch, and included it along with the patch. It must have gotten lost in the discussion, but that's my fault for not opening a second email thread on the issue. I hope you didn't waste too much time tracking it down, but I'm glad to see that yours has tests where mine did not. Thanks, Mike Lambert
Re: [PATCH] intconst parameter type
G'day all. On Fri, Apr 19, 2002 at 01:08:46PM -0700, Steve Fink wrote: > Should it be all one keyword, or should 'const' be an orthogonal > modifier? IMO, one modifier, because "const" doesn't make sense on any direction but "in". > > - Nobody is likely to use it any time soon. > > I will be implementing jump tables in the regex compiler soonish. > Register destinations may (or may not) make that easier. I think a dedicated dense switch op (and a corresponding sparse switch op) might be a better solution. Otherwise, jump tables need arrays, and PMC arrays seem like overkill. The problem with switch ops is that either you need a variable number of arguments, or you need to put the switch arms somewhere else. Probably the const table is the sanest place, but then you need to come up with an assembler syntax and an automatic way to generate it from the ops file. > This reminds me of when this is necessary. How will we be calling > methods? We'll be looking up some kind of code address by index or > name and putting it into a register. (jump_i sounds plausible for > this.) Or jsr_i even. The JVM solution is to have a dedicated method call op, which is a great idea in principle, but may not work with everyone's object model. > The optimizer argument only matters when register addresses are > actually used. If you're right and nobody ever uses it, then the > optimizer doesn't care. When it is used, it's just as hard for the > optimizer to deal with > > unless P0, $REG_BRANCH_1 > ... > $REG_BRANCH_1: > jump I0 > > as it would be to deal with 'unless P0, I0', no? The problem is that when you use a register branch target anywhere, it means the entire _module_ can't have simple optimization done, because a branch could potentially go anywhere. I think the problem could be fixed with some semantic constraints. For example: - No jumps between subs except through the sub's entry point are allowed. - jump_i and jsr_i (plus maybe some others not yet written) are the only instructions which can branch to register targets. They take absolute addresses, not relative ones. - There are only a limited number of ways to generate an absolute address, such as: - Returned from a call. - Vtable method lookup. - A special op which turns a relative address (which must be const) into an absolute address. Any attempt to call an absolute address which was not generated in one of these documented ways (e.g. by performing some computation) results in undefined behaviour. Cheers, Andrew Bromage
Re: [PATCH] intconst parameter type
On Sun, Apr 21, 2002 at 01:58:58PM +1000, Andrew J Bromage wrote: > I think the problem could be fixed with some semantic constraints. For > example: > > - No jumps between subs except through the sub's entry point > are allowed. Do we want to restrict subs to a single entry point? (for example, what if you want one "initial" entry point, and one "resume" entry point that figures out where processing left off?) > - There are only a limited number of ways to generate an > absolute address, such as: > > - Returned from a call. > - Vtable method lookup. > - A special op which turns a relative address (which > must be const) into an absolute address. > > Any attempt to call an absolute address which was not > generated in one of these documented ways (e.g. by performing > some computation) results in undefined behaviour. Not sure what you mean by "returned from a call". That sounds like you're restricting how addresses can be passed around. So I can't have an address in an integer variable and copy it to another? What's the difference between that and returning an address from a call? Or do you mean return addresses? Ah, that would make sense. So would this be the same as what you were proposing: - The only valid absolute code addresses are those of - Labelled instructions - Instructions following bsr/jsr (return addresses) - No arithmetic is possible on code addresses (the effects are undefined) - Local label addresses are only valid within the scope containing the label (the result of jumping to someone else's local label is undefined, possibly triggering an exception in debug mode.) - Otherwise, code addresses may be treated as plain INTVALs (stored in arrays, copied between registers, pushed on the user stack, etc.) I want to be allowed to store the absolute address of a local label in a state structure, return from the subroutine, re-enter with the state information, and jump straight to that label. We'd need to define label scopes.
Re: [PATCH] intconst parameter type
G'day. On Sat, Apr 20, 2002 at 10:06:10PM -0700, Steve Fink wrote: > Do we want to restrict subs to a single entry point? (for example, > what if you want one "initial" entry point, and one "resume" entry > point that figures out where processing left off?) Not necessarily. These are just ideas, remember. I want to restrict the number of ways you can get addresses of labels to ways that an optimizer/JIT compiler can fairly easily obtain them. > Not sure what you mean by "returned from a call". Functions should be able to return pointers to labels. > That sounds like > you're restricting how addresses can be passed around. So I can't have > an address in an integer variable and copy it to another? What's the > difference between that and returning an address from a call? No difference. You can copy labels as many times as you like, so long as you don't _manufacture_ them in arbitrary ways. > So would this be the same as what you were proposing: > > - The only valid absolute code addresses are those of >- Labelled instructions Currently, there are no labelled instructions. I don't really think we need them so long as we have some way to recover the labels (which is really what this discussion is all about). > - No arithmetic is possible on code addresses (the effects are >undefined) Absolutely. > - Local label addresses are only valid within the scope containing >the label (the result of jumping to someone else's local label is >undefined, possibly triggering an exception in debug mode.) At the moment, scope == module. In the future, I can think of several meanings for "scope" which make sense, but sub/method/function works for me too. I think there is a good argument to be made for limiting the organisation of bytecode files to have only one sub per code block, but I'm not sure the argument necessarily applies to Parrot. The JVM has this constraint, but it also doesn't easily support languages with Wirth-style nested subroutines. Parrot probably doesn't want to go to a great deal of trouble to support them, but we don't want to make it unnecessarily painful, either. > - Otherwise, code addresses may be treated as plain INTVALs (stored >in arrays, copied between registers, pushed on the user stack, >etc.) Yup. An optimizer, if it doesn't want to go to a lot of trouble, can make the conservative assumption that any jump_i can potentially jump to any label whose address has been taken somewhere else in the code. That's fine so long as this _is_ a conservative assumption. :-) Cheers, Andrew Bromage
Re: [PATCH] intconst parameter type
Andrew J Bromage <[EMAIL PROTECTED]> writes: > On Sat, Apr 20, 2002 at 10:06:10PM -0700, Steve Fink wrote: >> - Local label addresses are only valid within the scope containing >>the label (the result of jumping to someone else's local label is >>undefined, possibly triggering an exception in debug mode.) > > At the moment, scope == module. In the future, I can think of > several meanings for "scope" which make sense, but > sub/method/function works for me too. > > I think there is a good argument to be made for limiting the > organisation of bytecode files to have only one sub per code block, > but I'm not sure the argument necessarily applies to Parrot. The > JVM has this constraint, but it also doesn't easily support > languages with Wirth-style nested subroutines. Parrot probably > doesn't want to go to a great deal of trouble to support them, but > we don't want to make it unnecessarily painful, either. Um... If you're talking about what I think you're talking about you should remember that Perl 6 is going to have nested subroutines, both private and public: sub fact($n) { my sub tail_fact($n, $i) { when 0 { return $i } default { tail_fact($n - 1, $i * $n) } } fail_fact($n, 1); } Or are 'Wirth style' nested subs different? -- Piers "It is a truth universally acknowledged that a language in possession of a rich syntax must be in need of a rewrite." -- Jane Austen?