date:20020420

RE: Regex and Matched Delimiters

2002-04-20 Thread Mike Lambert

> He then went on to describe something I didn't understand at all.
> Sorry.

Few corrections to what you wrote:

To avoid the problem of extending {} to support new features with a
character 'x', without breaking stuff that might have an 'x' immediately
after the '{', my proposal is to require one space after the { before the
real regex appears.

So to correct the example I wrote of /{a|b|c}+/, it would become
/{ a|b|c}+/. It looks a bit weird if you're accustomed to perl5's behavior
of (?:). { \ } would then match a single space. {  } would do nothing,
since the second space falls under the whitespace-insensitive regex rule.

Now, since we require a space, all the characters before this space
now become 'special' in some form. This fact allows us to add new
special characters and map them to functionality, if perl doesn't
already do that.

For example, I would register | to be:
sub zerowidth ($regex) {
  return <<"EOF";
  push \@pos, pos();
  regex_run $( qr/$regex/ );
  pos() = pop \@pos;
  EOF
}

And conversely, _ would be written as:
sub regularwidth ($r) {
  return "regex_run $( qr/$r/ )"
}

This would allow me to do whacky things, like register these:
sub plus ($r) {return "\$level++;regex_run $( qr/$r/ )"}
sub minus($r) {return "\$level--;check();regex_run $( qr/$r/ )"}
sub check {assert($level>0)}
{ {+ \(} | {- \)} | . } ({ check() })

brent and I also disagreed on the use of sexegers. japhy has done more
thinking about this than either of us have, so perhaps we should just let
him weigh in on the issue. I proposed that {< be a sexeger, whereas he
prefers {< be a lookbehind. I'll use the former for the rest of this
discussion, since on IRC we hd to agree to disagree on it.

Regardless, having support for sexegers supports all of the behavior of
lookbehinds, since lookbehinds are just a constant-string, and could never
be a regex in Perl5. I still like the way lookbehinds work, and am not
suggesting that they disappear entirely, but rather that they be changed
into an underlying sexeger form.

sub b ($reg) {
  my $ger = reverse $reg;
  return "run_regex qr/{<|= \Q$ger\E}/"
}

The following perl5 regex:
/(?<=foo)bar/
is now equivalent to:
/(b foo)bar/

> The only major drawback I can see to that is the naïve user might type
> {.*?}+ expecting a bunch of text in bold tags and getting a

Sorry I forgot to make that clearer. The above regex would have to be
written as { .*}+ to work properly, specifiying that there are no
special tokens.

> Here's how it works:
>   -If the code returns undef, we backtrack.
>   -If the code returns the empty string, we move on.
>   -If the code returns anything else, we interpolate that into the
> regex.
>
> So, we now just have ({}).

({print "hello"}) will unfortunately, be really weird. Since it returns 1,
the block will return 1. We'd have to force-specify a return value of "".
While simplifying the set of operators is good, and I want do a bunch of
that myself, we should probably offer a way to perform 'execute with
no interpolated regex' behavior of before, somehow built up on top of
the existing ({}) operator.

Reflecting on it all a bit, if we're willing to make a larger sacrifice
in backwards compatibility, it might make things make more sense.
- {} would be the code operator, which was specified up above as ({}).
  This makes more sense, imo, since {} is traditionally used for
  blocks.
- () would have all the special semantics described for {} in this
  thread.

The default for () could still be capturing, so ( a*) performs capturing
on /a*/. We'd then have to define another pair of symbols for turning
capturing on and off. All instances of Perl5's (blah)  would convert to
( blah), and all instances of the special operators in perl5 a la
(?@#blah) would translate as they did before, but also specifying the
'dont capture within these parens' special identifier.

Basically, I'm trying to propose a system which makes all the regex stuff
become orthogonal. Rather than creating a bunch of hardcoded types of (?>=
regex operators, instead define small functionalities which can be
combined in ways to emulate these tried and true constructs.

Brent, let me know if I'm still spouting gibberish on this email. :)

Mike Lambert

Re: Roadmap for Parrot

2002-04-20 Thread Chip Salzenberg


According to [EMAIL PROTECTED] (Dan Sugalski):
>At the moment, we don't have to support cascading lexical 
>scratchpads--since we know at compile time which variables we're 
>accessing and where they come from, we can install trampoline entries 
>in the current scope's scratchpad and not have to search outward at 
>runtime.

Once you start down that path, forever will it dominate your closure
implementation.  I suggest you do lexical scopes right from the start.
Surely they're not that much harder than a trampoline  Are they?

PS: I'm back.  }:-)
-- 
Chip Salzenberg - a.k.a.  -<[EMAIL PROTECTED]>
 "It furthers one to have somewhere to go."

Please rename 'but' to 'has'.

2002-04-20 Thread Daniel S. Wilkerson


Larry,

Please don't use 'but' to associate runtime properties to things.
Please call it 'has'.

First, but is just strange.  I have a thing and I want to tell you it is
red, so I say 'but'.  Huh?

Using 'has' makes a nice parallel with 'is' for compile time properties:
What you are is determinted at compile time, what you have is determined
at run time.

Daniel

Re: Call stack manipulation?

2002-04-20 Thread Andrew J Bromage

G'day all.

On Fri, Apr 19, 2002 at 09:55:46AM +0100, Piers Cawley wrote:

> It's fine for partial continuations certainly, but less fine if you
> want to implement full continations which require you to save the
> state of the entire stack. I was hoping I'd find a way to do this
> without having to wait for Parrot to get its own continuations.

I suspect that using relative offsets for jump instructions might have
been a bad idea in retrospect.  Sure it's faster, but now we have
prederef, which could turn absolute offsets into relative offsets at
run time.

> If
> 
>   set I0, LABEL
>  
> sets I0 to an address which I can C to from anywhere then I have
> what I need.

One possible solution is to introduce a new op:

inline op codeoffset(inconst INT, out INT)

The idea is to turn a relative offset into an absolute address which
can then be passed to "goto ADDRESS()".

Another possibility is:

inline op adjustoffset(in INT, inconst INT, inconst INT, out INT) {
$4 = $1 + $2 - $3;
goto NEXT();
}

The idea is that this:

adjustoffset I0, LABEL1, LABEL2, I1

will adjust an offset I0 (which is relative to LABEL1) to make it
relative to LABEL2.  That way, you can canonicalise all your offsets
and just re-adjust them before jumping.

(Note: Yes, this could be done with two existing ops.  The benefit of
making it another op is that a JIT compiler can easily compile it away
if it chooses to use only absolute offsets.)

> What's SECD by the way?

http://www.wikipedia.com/wiki/SECD+machine

If you have a well-stocked university library handy:

@book{Henderson80,
  author  = {Peter Henderson},
  title   = {{Functional Programming -- Application and Implementation}},
  publisher = {Prentice/Hall International},
  year= 1980,
  series  = {Series in Computer Science},
  topics  = {FP - General,FP - Implementation}
}

Cheers,
Andrew Bromage

Re: [PATCH] Op metadata

2002-04-20 Thread Andrew J Bromage

G'day all.

On Fri, Apr 19, 2002 at 05:28:12PM -0300, Daniel Grunblatt wrote:

> Add me to the list, because I'm writting the jit optimizer and ran into
> the same problem, we must add some metadata otherwise I will end up
> hard-coding all the information deep into the optimizer and that is a Bad
> Thing (tm). I don't know if this is the best way to solve this but I
> rather like it.

I'm leaning towards invoking the politician's syllogism:

 We must do something.
 This is something.

 Therefore we must do this.

We can always delete it later if (read: when) we think of something
better.

Cheers,
Andrew Bromage

Re: Regex and Matched Delimiters

2002-04-20 Thread Me


Let me see if I understand the final version of your (Mike's)
suggestions
and where it appears to be headed:

Backwards compatibility:
perl5 extended syntax still works in perl6 if one happens to use it.

Forward conversion:
Automatic conversion of relevant perl5 regex syntax to perl6 is simple.

New extension syntax:
1. Syntax is (ops data).
2. There are a bunch of built-in ops, but user can define new ones.

[2c. What about ( data) or (ops data) normally means non-capturing,
($2 data) captures into $2, ($foo data) captures into $foo?]

Rationalized ops syntax:
Ops string consists of arbitrarily ordered individual op characters.
(eg '<' signifies a look behind, '!<' signifies fail if look behind
match.)

Embedded code:
Code is inserted using {} with something other than digits in them.

(Other stuff, such as sexegers, ignored.)

--
ralph

Re: Please rename 'but' to 'has'.

2002-04-20 Thread Me


I agree 'but' seems a tad odd, and I like the elegance of your
suggestion at first sight. However...

> First, but is just strange.  I have a thing and I want to tell you it
is
> red, so I say 'but'.  Huh?

banana but red;
"foo" but false;

According to Larry, run time properties will most often be used
to contradict a built-in or compile time property. If he is right
about the dominant case being a contradiction, 'but' works
better for me than anything else I can think of, including 'now'
(explained below).

-

Even if usage to contradict a built-in or compile time property
is not the most common form of usage, it is still arguably the
case that if one keyword is to cover both cases (contradict
or not), then having the keyword "warn" that contradiction
may have occured is better than having the keyword indicate
to a newbie that there is nothing to worry about, as would be
the case with 'has'.

Further, even if the "warn" notion is deemed unimportant,
'has' is still far from an ideal fit in many cases:

banana has red;
"foo" has false;

Yet another issue is use of 'is' in a conditional:

if ($foo is red) ...

This would be nice, and would work nicely if one uses a
different keyword for runtime properties, but works best
if that other word is more consistent with the notion of 'is'
than 'has' is.

One plausible middle ground word off the top of my head
that is odd in its own special way would be 'now':

banana now red;
"foo" now false;
banana now foo;
banana now tainted;

I read 'now' as somewhat suggestive of changing something.

--
ralph

Re: Regex and Matched Delimiters

2002-04-20 Thread Me


> [2c. What about ( data) or (ops data) normally means non-capturing,
> ($2 data) captures into $2, ($foo data) captures into $foo?]

which is cool where being explicit simplifies things, but
ain't where implicit is simpler. So, maybe add an op ('$'?)
or switch that makes parens capturing by default, ie as
per perl5.

--
ralph

Re: Please rename 'but' to 'has'.

2002-04-20 Thread David Wheeler


On 4/20/02 3:02 PM, "Me" <[EMAIL PROTECTED]> claimed:

>   banana now red;
>   "foo" now false;
>   banana now foo;
>   banana now tainted;
> 
> I read 'now' as somewhat suggestive of changing something.

I actually rather like this keyword. It not only suggests that something has
changed, but that it has changed at a particular time -- runtime. Compile
time properties just *are* (is), no matter what, unless and until you say,
at runtime, that it is *now* something else.

-- 
David Wheeler AIM: dwTheory
[EMAIL PROTECTED] ICQ: 15726394
http://david.wheeler.net/  Yahoo!: dew7e
   Jabber: [EMAIL PROTECTED]

[PATCH] Fix another GC segfault

2002-04-20 Thread Simon Glover



 I was playing around a bit with the set_keyed and get_keyed ops and found
 that this:

 new P0, PerlArray
 set I0, 0
 LOOP:   set_keyed P0, I0, I0
 inc I0
 lt I0, 1, LOOP
 end

 causes Parrot to segfault.

 The culprit appears to be this bit of code in trace_active_PMCs in
 resources.c:

  

  else {
/* The only thing left is "buffer of PMCs" */
Buffer *trace_buf = current->data;
PMC **cur_pmc = trace_buf->bufstart;
/* Mark the damn buffer as used! */
trace_buf->flags |= BUFFER_live_FLAG;
for (i = 0; i < trace_buf->buflen; i++) {
if (cur_pmc[i]) {
last = mark_used(cur_pmc[i], last);
}
}
}

  

 The problem is that trace_buf->buflen is the size of the buffer, and
 not the number of PMCs contained in it, so the loop reads out of the
 end of cur_pmc and into garbage data. The patch below fixes this, and
 also adds a test-case to perlarray.t to stop it from coming back.

 Simon


--- resources.c.old Sat Apr 20 17:55:28 2002
+++ resources.c Sat Apr 20 17:58:45 2002
@@ -462,9 +462,10 @@
 /* The only thing left is "buffer of PMCs" */
 Buffer *trace_buf = current->data;
 PMC **cur_pmc = trace_buf->bufstart;
+UINTVAL no_of_pmcs = trace_buf->buflen / sizeof(PMC*);
 /* Mark the damn buffer as used! */
 trace_buf->flags |= BUFFER_live_FLAG;
-for (i = 0; i < trace_buf->buflen; i++) {
+for (i = 0; i < no_of_pmcs; i++) {
 if (cur_pmc[i]) {
 last = mark_used(cur_pmc[i], last);
 }


--- t/pmc/perlarray.t.old   Sat Apr 20 18:09:16 2002
+++ t/pmc/perlarray.t   Sat Apr 20 18:13:48 2002
@@ -1,6 +1,6 @@
 #! perl -w

-use Parrot::Test tests => 6;
+use Parrot::Test tests => 7;
 use Test::More;

 output_is(<<'CODE', <<'OUTPUT', "size of the array");
@@ -275,4 +275,18 @@
 ok 19
 OUTPUT

+output_is(<<'CODE', <<'OUTPUT', "Array resizing stress-test");
+new P0, PerlArray
+set I0, 0
+LOOP:   set_keyed P0, I0, I0 # set P0[I0], I0
+inc I0
+lt I0, 1, LOOP
+get_keyed I1, P0,    # set I1, P0[]
+print I1
+print "\n"
+end
+CODE
+
+OUTPUT
+
 1;

Re: [PATCH] Fix another GC segfault

2002-04-20 Thread Mike Lambert


>  The problem is that trace_buf->buflen is the size of the buffer, and
>  not the number of PMCs contained in it, so the loop reads out of the
>  end of cur_pmc and into garbage data. The patch below fixes this, and
>  also adds a test-case to perlarray.t to stop it from coming back.

I thought this description sounded familiar...I included a note about it
when I was working on the parrot_reallocate_buffer patch, and included it
along with the patch. It must have gotten lost in the discussion, but
that's my fault for not opening a second email thread on the issue. I
hope you didn't waste too much time tracking it down, but I'm glad to
see that yours has tests where mine did not.

Thanks,
Mike Lambert

Re: [PATCH] intconst parameter type

2002-04-20 Thread Andrew J Bromage

G'day all.

On Fri, Apr 19, 2002 at 01:08:46PM -0700, Steve Fink wrote:

> Should it be all one keyword, or should 'const' be an orthogonal
> modifier?

IMO, one modifier, because "const" doesn't make sense on any direction
but "in".

> > - Nobody is likely to use it any time soon.
> 
> I will be implementing jump tables in the regex compiler soonish.
> Register destinations may (or may not) make that easier.

I think a dedicated dense switch op (and a corresponding sparse switch
op) might be a better solution.  Otherwise, jump tables need arrays,
and PMC arrays seem like overkill.

The problem with switch ops is that either you need a variable number
of arguments, or you need to put the switch arms somewhere else.
Probably the const table is the sanest place, but then you need to come
up with an assembler syntax and an automatic way to generate it from
the ops file.

> This reminds me of when this is necessary. How will we be calling
> methods? We'll be looking up some kind of code address by index or
> name and putting it into a register. (jump_i sounds plausible for
> this.)

Or jsr_i even.

The JVM solution is to have a dedicated method call op, which is a great
idea in principle, but may not work with everyone's object model.

> The optimizer argument only matters when register addresses are
> actually used. If you're right and nobody ever uses it, then the
> optimizer doesn't care. When it is used, it's just as hard for the
> optimizer to deal with
> 
>   unless P0, $REG_BRANCH_1
>   ...
>   $REG_BRANCH_1:
>   jump I0
> 
> as it would be to deal with 'unless P0, I0', no?

The problem is that when you use a register branch target anywhere, it
means the entire _module_ can't have simple optimization done, because
a branch could potentially go anywhere.

I think the problem could be fixed with some semantic constraints.  For
example:

- No jumps between subs except through the sub's entry point
  are allowed.

- jump_i and jsr_i (plus maybe some others not yet written) are
  the only instructions which can branch to register targets.
  They take absolute addresses, not relative ones.

- There are only a limited number of ways to generate an
  absolute address, such as:

- Returned from a call.
- Vtable method lookup.
- A special op which turns a relative address (which
  must be const) into an absolute address.

  Any attempt to call an absolute address which was not
  generated in one of these documented ways (e.g. by performing
  some computation) results in undefined behaviour.

Cheers,
Andrew Bromage

Re: [PATCH] intconst parameter type

2002-04-20 Thread Steve Fink

On Sun, Apr 21, 2002 at 01:58:58PM +1000, Andrew J Bromage wrote:
> I think the problem could be fixed with some semantic constraints.  For
> example:
> 
>   - No jumps between subs except through the sub's entry point
> are allowed.

Do we want to restrict subs to a single entry point? (for example,
what if you want one "initial" entry point, and one "resume" entry
point that figures out where processing left off?)

>   - There are only a limited number of ways to generate an
> absolute address, such as:
> 
>   - Returned from a call.
>   - Vtable method lookup.
>   - A special op which turns a relative address (which
> must be const) into an absolute address.
> 
> Any attempt to call an absolute address which was not
> generated in one of these documented ways (e.g. by performing
> some computation) results in undefined behaviour.

Not sure what you mean by "returned from a call". That sounds like
you're restricting how addresses can be passed around. So I can't have
an address in an integer variable and copy it to another? What's the
difference between that and returning an address from a call?

Or do you mean return addresses? Ah, that would make sense.

So would this be the same as what you were proposing:

 - The only valid absolute code addresses are those of
   - Labelled instructions
   - Instructions following bsr/jsr (return addresses)
 - No arithmetic is possible on code addresses (the effects are
   undefined)
 - Local label addresses are only valid within the scope containing
   the label (the result of jumping to someone else's local label is
   undefined, possibly triggering an exception in debug mode.)
 - Otherwise, code addresses may be treated as plain INTVALs (stored
   in arrays, copied between registers, pushed on the user stack,
   etc.)

I want to be allowed to store the absolute address of a local label in
a state structure, return from the subroutine, re-enter with the state
information, and jump straight to that label.

We'd need to define label scopes.

Re: [PATCH] intconst parameter type

2002-04-20 Thread Andrew J Bromage

G'day.

On Sat, Apr 20, 2002 at 10:06:10PM -0700, Steve Fink wrote:

> Do we want to restrict subs to a single entry point? (for example,
> what if you want one "initial" entry point, and one "resume" entry
> point that figures out where processing left off?)

Not necessarily.  These are just ideas, remember.  I want to restrict
the number of ways you can get addresses of labels to ways that an
optimizer/JIT compiler can fairly easily obtain them.

> Not sure what you mean by "returned from a call".

Functions should be able to return pointers to labels.

> That sounds like
> you're restricting how addresses can be passed around. So I can't have
> an address in an integer variable and copy it to another? What's the
> difference between that and returning an address from a call?

No difference.  You can copy labels as many times as you like, so long
as you don't _manufacture_ them in arbitrary ways.

> So would this be the same as what you were proposing:
> 
>  - The only valid absolute code addresses are those of
>- Labelled instructions

Currently, there are no labelled instructions.  I don't really think we
need them so long as we have some way to recover the labels (which is
really what this discussion is all about).

>  - No arithmetic is possible on code addresses (the effects are
>undefined)

Absolutely.

>  - Local label addresses are only valid within the scope containing
>the label (the result of jumping to someone else's local label is
>undefined, possibly triggering an exception in debug mode.)

At the moment, scope == module.  In the future, I can think of several
meanings for "scope" which make sense, but sub/method/function works
for me too.

I think there is a good argument to be made for limiting the
organisation of bytecode files to have only one sub per code block, but
I'm not sure the argument necessarily applies to Parrot.  The JVM has
this constraint, but it also doesn't easily support languages with
Wirth-style nested subroutines.  Parrot probably doesn't want to go to
a great deal of trouble to support them, but we don't want to make it
unnecessarily painful, either.

>  - Otherwise, code addresses may be treated as plain INTVALs (stored
>in arrays, copied between registers, pushed on the user stack,
>etc.)

Yup.  An optimizer, if it doesn't want to go to a lot of trouble, can
make the conservative assumption that any jump_i can potentially jump
to any label whose address has been taken somewhere else in the code.

That's fine so long as this _is_ a conservative assumption. :-)

Cheers,
Andrew Bromage

Re: [PATCH] intconst parameter type

2002-04-20 Thread Piers Cawley


Andrew J Bromage <[EMAIL PROTECTED]> writes:
> On Sat, Apr 20, 2002 at 10:06:10PM -0700, Steve Fink wrote:
>>  - Local label addresses are only valid within the scope containing
>>the label (the result of jumping to someone else's local label is
>>undefined, possibly triggering an exception in debug mode.)
>
> At the moment, scope == module.  In the future, I can think of
> several meanings for "scope" which make sense, but
> sub/method/function works for me too.
>
> I think there is a good argument to be made for limiting the
> organisation of bytecode files to have only one sub per code block,
> but I'm not sure the argument necessarily applies to Parrot.  The
> JVM has this constraint, but it also doesn't easily support
> languages with Wirth-style nested subroutines.  Parrot probably
> doesn't want to go to a great deal of trouble to support them, but
> we don't want to make it unnecessarily painful, either.

Um... If you're talking about what I think you're talking about you
should remember that Perl 6 is going to have nested subroutines, both
private and public:

  sub fact($n) {
my sub tail_fact($n, $i) {
  when 0 { return $i }
  default { tail_fact($n - 1, $i * $n) }
}
fail_fact($n, 1);
  }

Or are 'Wirth style' nested subs different?

-- 
Piers

   "It is a truth universally acknowledged that a language in
possession of a rich syntax must be in need of a rewrite."
 -- Jane Austen?

RE: Regex and Matched Delimiters

Re: Roadmap for Parrot

Please rename 'but' to 'has'.

Re: Call stack manipulation?

Re: [PATCH] Op metadata

Re: Regex and Matched Delimiters

Re: Please rename 'but' to 'has'.

Re: Regex and Matched Delimiters

Re: Please rename 'but' to 'has'.

[PATCH] Fix another GC segfault

Re: [PATCH] Fix another GC segfault

Re: [PATCH] intconst parameter type

Re: [PATCH] intconst parameter type

Re: [PATCH] intconst parameter type

Re: [PATCH] intconst parameter type

15 matches

Site Navigation

Mail list logo

Footer information