Re: To get things started...

2000-11-21 Thread Tom Hughes

In message <[EMAIL PROTECTED]>
  Dan Sugalski <[EMAIL PROTECTED]> wrote:

> At 10:18 AM 11/21/00 -0800, Benjamin Stuhl wrote:
>
> >Well, it would (IMHO) make more sense to have
> >perl6_parse_script (I do tend to follow
> >{subsystem,verb,object} naming...)
>
> Or Perl$parse_script, but that's a matter of taste, I suppose. :)

Given that it isn't a valid C identifier, yes... Unless you're
using VAXC or DECC of course, which was your point I assume ;-)

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/
...Discoveries are made by not following instructions.




Perl 6 paper

2000-11-21 Thread Skud

This coming Saturday, I'm presenting a paper on Perl 6 (the story so
far) at the Australian Open Source Symposium.

Is anyone interested in looking over my notes and commenting on them
in the next couple of days?

K.

-- 
Kirrily 'Skud' Robert - [EMAIL PROTECTED] - http://infotrope.net/
Today is the first day of the rest of your life.  Give up now.



Re: Guidelines for internals proposals and documentation

2000-11-21 Thread Dan Sugalski

At 02:45 PM 11/17/00 +, David Grove wrote:

>Dan Sugalski <[EMAIL PROTECTED]> wrote:
>
>  > At 10:19 AM 11/17/00 -0800, Ken Fox wrote:
>  > >However, I don't want to see early (premature) adoption of fundamental
>  > >pieces like the VM or parser. It makes sense to me to explore many
>  > possible
>  > >designs and pick and choose between them. Also, if we can keep
>external
>  > API
>  > >design separate from internal design I think we'll have more wiggle
>room
>  > >to experiment. (That's one of the big problems with perl 5 right now.)
>  >
>  > That's one of the reasons I'd like to work on the APIs first. I realize
>
>  > that doing even that will have an effect on the design of the pieces
>  > behind
>  > the APIs, but we have to startsomewhere.
>
>But.. but... but... we don't even have a design spec. I mean, we don't
>even know for sure what Perl 6 is going to look like for certain, inside
>or outside. Wouldn't we have to know the outside before we try to put the
>insides together?

No, not really. For the actual code we will, of course, but there's a lot 
we can do now. (And a good part of the parser could still be written now, 
since most of the changes will likely be reasonably trivial) The APIs perl 
presents to the world are pretty much independent of the language.

For example, we can take a good stab at the extension API now--regardless 
of how the language looks, extensions will still need to get and set 
scalar, hash, and array values. Perl would have to change a *lot* for that 
to be no longer valid. The API presented to an embedding programs similarly 
can be worked on--the fact that the language might change doesn't alter the 
syntax of the run_perl_code() function. (or whatever we call it)

We also do have, generally speaking, a picture of both perl (since Larry 
has said we're not gutting the language entirely) and the internal 
structure. I've been a bit lax in presenting that internal picture, but 
I'll fix that in a little bit.

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: Perl 6 paper

2000-11-21 Thread David Grove

I would, certainly. But I also think that the group as a whole would enjoy
the preview.

Kirrily "Skud" Robert <[EMAIL PROTECTED]> wrote:

 > This coming Saturday, I'm presenting a paper on Perl 6 (the story so
 > far) at the Australian Open Source Symposium.
 >
 > Is anyone interested in looking over my notes and commenting on them
 > in the next couple of days?
 >
 > K.
 >
 > --
 > Kirrily 'Skud' Robert - [EMAIL PROTECTED] - http://infotrope.net/
 > Today is the first day of the rest of your life.  Give up now.
 >




SvPV*

2000-11-21 Thread Nicholas Clark

(I'm not sure if I've missed all the fun here before I subscribed, but
I can't anything on the RFC list that mentions the following)

perl5 has a tangle of SvPV macros to allow C code to get a pointer
to the scalar. (or the "private", with or without the length, and
more relating to utf8 that don't even appear to be documented)

Has any thought yet been given to the API to get scalars?

Jarkko posted an idea on p5p of "Virtual Values" which would permit a
scalar to point to another scalar's buffer, rather than its own.
Currently the perl5 API assumes that you get a read-write pointer, and that
the thing it points to is "\0" terminated. This makes it hard to implement
copy on write, or to allow a pointer to a sub-length of the parent
scalar's buffer.

IIRC Ilya mailed p5p bemoaning the fact that perl's SVs use a continuous
buffer. A split-buffer representation (where a hole is allowed in the
middle of the buffer data) permits much faster replacement type operations,
as there is less copying, and you can move the hole around to suit your
needs.

So I was wondering if perl6 was going to replace SvPV* with something that
allows the caller to say whether they'd like

* read only or read write
* buffer all in one block or can cope with a hole (plus tell me where it is)
* null terminated buffer or don't care

and possibly

* data must be in utf8 or tell me what the data is in

although this might be better done as caller specifies 1 or more acceptable
encodings they could cope with, and SvPV* returns data in whatever
requires least work to translate consistent with maintaining accuracy.

In particular specifying read/write versus read only would allow
perl to treat scalars as copy-on-write which would mean things like
$a=$b wouldn't actually copy anything (wasting time and (shared) memory
pages) until either $a or $b got changed.
[I have this feeling that there's a bit of this already in sv.c, but I'm
not sure how much]

Nicholas Clark



Re: SvPV*

2000-11-21 Thread Jarkko Hietaniemi

On Tue, Nov 21, 2000 at 05:04:32PM +, Nicholas Clark wrote:
> (I'm not sure if I've missed all the fun here before I subscribed, but
> I can't anything on the RFC list that mentions the following)
> 
> perl5 has a tangle of SvPV macros to allow C code to get a pointer
> to the scalar. (or the "private", with or without the length, and
> more relating to utf8 that don't even appear to be documented)
> 
> Has any thought yet been given to the API to get scalars?
> 
> Jarkko posted an idea on p5p of "Virtual Values" which would permit a
> scalar to point to another scalar's buffer, rather than its own.

That was the other half, yes.  The other half was it that a VV would
point to a 'window' or 'slice' of the other scalar's buffer, not
necessarily the whole buffer.

> Currently the perl5 API assumes that you get a read-write pointer, and that
> the thing it points to is "\0" terminated. This makes it hard to implement
> copy on write, or to allow a pointer to a sub-length of the parent
> scalar's buffer.

What he said.

> IIRC Ilya mailed p5p bemoaning the fact that perl's SVs use a continuous
> buffer. A split-buffer representation (where a hole is allowed in the
> middle of the buffer data) permits much faster replacement type operations,
> as there is less copying, and you can move the hole around to suit your
> needs.

Yet another bummer of the current SVs is that they poorly fit into
'foreign memory' situations where the buffer is managed by something
else than Perl.  "No, thank you, Perl, keep your greedy fingers off
this chunk.  No, you may not play with it."

-- 
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen



Re: SvPV*

2000-11-21 Thread Dan Sugalski

At 05:04 PM 11/21/00 +, Nicholas Clark wrote:
>(I'm not sure if I've missed all the fun here before I subscribed, but
>I can't anything on the RFC list that mentions the following)
>
>perl5 has a tangle of SvPV macros to allow C code to get a pointer
>to the scalar. (or the "private", with or without the length, and
>more relating to utf8 that don't even appear to be documented)
>
>Has any thought yet been given to the API to get scalars?

Yup. The internal details will be hidden from the extension writer--if you 
do a get_string(PMC, UTF_8) you get back the UTF-8 encoded version of the 
scalar that PMC represents, regardless of any internal format. That way if 
some scalar function writer has some need to do odd things they can without 
having to worry about telling extension writers. It also means that an 
extension doesn't have to care that a PMC represents, say, a complex 
number--they ask for it in UTF-8 format and get back "4 + 3i" and that's fine.

This isolation will also reduce cross-version breakage. While I'd like to 
eliminate that, I doubt it's entirely feasable.

It'll be possible to get the gory details if you want them, but then you'll 
have to go a step lower in the API and, well, the docs say "Here there be 
dragons". Or they will, at least.

One of the things we need to hammer out on the extension API list is 
exactly what sorts of things need to be generally exposed to extensions and 
what don't.

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: To get things started...

2000-11-21 Thread David Grove


Simon Cozens <[EMAIL PROTECTED]> wrote:

 > On Tue, Nov 21, 2000 at 10:37:23AM +, David Grove wrote:
 > > I'm not sure that it's possible to do this, or disirable. If Larry
wants
 > > Perl to use different modes, creoles, or ways of interpreting or
 > > understanding the "perl" language, then we have to let the parser
have a
 > > bit more information.
 >
 > Yes, but these don't have to be external level calls.
 >
 > > As a point of clarification, I am seeing the external parser as that
part
 > > of perl that sees the user's script directly.
 >
 > Likewise.
 >
 > This is why the "creole" rulesets are *not* external calls.
 >
 > Syntax definitions, in the form of Perl programs or pragmata, will have
to
 > go
 > through two stages before they can be used. First, they have to be
parsed
 > as
 > Perl code. This is a call to the external API of the parser.
 >
 > Once this is done, the resulting op tree must be processed so that it
can
 > be
 > turned into a data structure (representing the grammar) which can be
 > understood by the parser.

Actually, I think I'm getting it. In my model, what you guys are calling
"internal API" is basically what I'm leaving in that intermediate area,
kinda sorta? The parser that I'm talking about is what receives the perl
code.

So:

1. The External API has a pure syntax that has already gone through a
toplevel process to produce something identifiable to the "External API".

2. There remains a separation between the internal API and the external
api.

3. I'm attempting to call that toplevel parser the "internal api", and
need a word for it.

I'd submit that, since the creole parser needs to speak to the internal
API, it should become part of the spec for the entire parser.

Does it make sense that the creole parser be on the top of that chart I
made, and that the External API ends up what's in the middle?

p




Re: To get things started...

2000-11-21 Thread Dan Sugalski

At 12:44 PM 11/21/00 +, Simon Cozens wrote:
>On Tue, Nov 21, 2000 at 07:36:11AM -0500, David Grove wrote:
> >  > * The parser needs to be reentrant
> > No clue what this means. I need this defined in context.
>
>While parsing text, you should be able to dive into a separate bit of text,
>parse that, ("re-enter" the parser's routines) come out and carry on 
>*exactly* where you left off, without your state being lost.

And, even more so with perl, you may call out of the parser into a bit that 
then calls right back into it. For example:

   BEGIN {
 eval "\$foo = 12;";
   }

the parser finishes with the parsing of the BEGIN block and, before 
continuing (and thus we haven't exited the parser) calls into the 
compilation/exection modules, which then get the eval and call right back 
into the parser. (Which then calls the compilation/execution bits, which 
could potentially call into the parser again...)

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: To get things started...

2000-11-21 Thread Dan Sugalski

At 12:46 PM 11/21/00 +, David Grove wrote:

>Dan Sugalski <[EMAIL PROTECTED]> wrote:
>
>  > At 10:37 AM 11/21/00 +, David Grove wrote:
>  > >Thanks for the clarifications, Simon.
>  > >
>  > >Simon Cozens <[EMAIL PROTECTED]> wrote:
>  > >If we were simply feeding it perl with a single syntax, we could get
>away
>  > >with a "one call" scheme. But since we're dealing with almost
>certainly
>  > >mutually exclusive syntax and semantics, it probably needs more
>  > >information.
>  >
>  > But we are. The call is probably going to be something like:
>  >
>  >status = parse_perl(perl_interpreter *my_interp,
>  > char *script,
>  >struct HIR *end_result,
>  >long flags);
>  >
>  > the fact that the script has a "use pythonish;" in it is entirely
>  > irrelevant--the program calls into the parser, which returns a status
>and
>  > possibly a parsed representation of the program. The parser gets to
>deal
>  > with all the grotty details.
>
>What form is this intermediary parsed representation in? API, right? Then
>I need to clarify that when I say bytecode, I've meant whatever this
>intermediary parsed representation is, be that pure perl, API, or
>otherwise.

Okay, you're more confused here than I though.

API = Application Program Interface (More or less. Something like that, at 
least). Basically the list of function calls and their parameters, perhaps 
with a set of rules around what can be called when. It's perfectly 
appropriate to have some things left as magic cookies at this point, like 
the syntax tree format, though their general characteristics can be specified.

It might well be that we want to define the format of the syntax tree now 
as well, though I'd have preferred to leave that for a little later.

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: To get things started...

2000-11-21 Thread David Grove


Dan Sugalski <[EMAIL PROTECTED]> wrote:

 > At 07:36 AM 11/21/00 -0500, David Grove wrote:
 > >However, one thing is seriously lacking in this theory... if the
parser is
 > >perl, how does the perl parse? (Sort of a woodchuck chucking wood type
of
 > >thing.) Somehow, the external parser API thingy has to know enough
perl
 > >(through the chosen language) to be able to handle the parsing.
 >
 > Nope. We do it in two phases. The end result will not actually parse
perl
 > code to build the parser (we'll provide bytecode for that) but to start
we
 > can run the parser through perl 5 to get a syntax tree until the perl 6

 > engine's capable of doing it itself.

Hmmm, that sounds familiar...



 > >To quote my perl elders, whatever can be done without regexen should
be
 > >done with index() (within limits, since some regexen can be quite
 > >optimized).
 >
 > No, not really. regexes are generally easier to comprehend than their
 > index
 > couterparts, and often faster. (There's a lot of code that needs to go
 > into
 > backtracking...) While index might be better sometimes we can't force
 > folks
 > to use it. Almost all of perl is up for grabs.

I won't argue the point as long as it works, the point being that we do it
with whatever method is capable of the greatest efficiency.

 > >The parser API needs to know both regexen and index() in order to
work.
 >
 > The parser will have a fully-functional interpreter to work with. All
of
 > perl will likely be there for it. (Modules and threads might not, but
 > that's still up in the air)

But that "interpreter" will be in the form of API, right?

 > >  > * The parser will have an active interpreter structure handy
 > >
 > >Is this the perl that parses the perl?
 >
 > Yup. In fact we might have two--the interpreter structure for the
 > interpreter running the parser, and the structure for the end-result
 > parsed
 > program. Or we might just use one and squirrel all the interpreter bits
in
 > a private (and deletable) namespace somewhere.

It's pretty clear that we're to purposely put in a distinct separation
between the two, unless I misunderstood Larry on this. I'm cautious about
dual-purposing anything here, since he said that this is a major problem
in Perl 5 today (the lack of flexibility between either end).

I'd like to ask for a clarification of the following terms as they apply
here:

1. External API
2. Internal API
3. Parser
4. Interpreter
5. What seems to be my "toplevel" parser (the creole parser)
6. Bytecode
7. Syntax Tree

And what language they should be in (if Larry's undefined language, just
say C-Larry or something), what they input, and what they output, what
what they input from and output to in terms of the next level of
functionality. I think we're on the same wavelength, but not speaking the
same language.

I'd also like to offer an explanation. As I mentioned earlier, I've
already been working on a perlish to perl translator, so the "toplevel
creole parser" as is particularly interesting to me as something that I've
basically already worked on, so it's where my head is. The different
output modes as well, because they're just the top turned upside down...
forgive my lack of attention to (and understanding of) the middleparts.

p




Re: To get things started...

2000-11-21 Thread David Grove

 > Okay, you're more confused here than I though.

I can't deny that, but at least I helped get this group talking. The
silence was deafening.

Participation feels good though, when I'm not getting yelled at for being
technically inarticulate (P5P). Maybe if we can keep up the good
attitudes, we can swamp the P5P in terms of active participants...

Above all, thanks for your patience, Dan (and Simon).

;-))

p





Re: To get things started...

2000-11-21 Thread Dan Sugalski

At 01:04 PM 11/21/00 +, David Grove wrote:

>Dan Sugalski <[EMAIL PROTECTED]> wrote:
>
>  > At 07:36 AM 11/21/00 -0500, David Grove wrote:
>  > >However, one thing is seriously lacking in this theory... if the
>parser is
>  > >perl, how does the perl parse? (Sort of a woodchuck chucking wood type
>of
>  > >thing.) Somehow, the external parser API thingy has to know enough
>perl
>  > >(through the chosen language) to be able to handle the parsing.
>  >
>  > Nope. We do it in two phases. The end result will not actually parse
>perl
>  > code to build the parser (we'll provide bytecode for that) but to start
>we
>  > can run the parser through perl 5 to get a syntax tree until the perl 6
>
>  > engine's capable of doing it itself.
>
>Hmmm, that sounds familiar...

Sure. Compilers have been doing it for decades.

>  > >To quote my perl elders, whatever can be done without regexen should
>be
>  > >done with index() (within limits, since some regexen can be quite
>  > >optimized).
>  >
>  > No, not really. regexes are generally easier to comprehend than their
>  > index
>  > couterparts, and often faster. (There's a lot of code that needs to go
>  > into
>  > backtracking...) While index might be better sometimes we can't force
>  > folks
>  > to use it. Almost all of perl is up for grabs.
>
>I won't argue the point as long as it works, the point being that we do it
>with whatever method is capable of the greatest efficiency.

As long as everyone understands that efficiency doesn't necessarily mean 
the code that executes the fastest. While I want the parser fast, it is 
generally a one-shot thing, and if it takes an extra millisecond or twelve 
that probably doesn't make much difference. Cutting a day or twelve off of 
the preliminary development time, though, does matter rather a lot more.

>  > >The parser API needs to know both regexen and index() in order to
>work.
>  >
>  > The parser will have a fully-functional interpreter to work with. All
>of
>  > perl will likely be there for it. (Modules and threads might not, but
>  > that's still up in the air)
>
>But that "interpreter" will be in the form of API, right?

No. The API is just a set of functions. I mean an iterpreter, a real entity 
that can do something. Pretty much the same as an interpreter instance in 
perl 5.

>  > >  > * The parser will have an active interpreter structure handy
>  > >
>  > >Is this the perl that parses the perl?
>  >
>  > Yup. In fact we might have two--the interpreter structure for the
>  > interpreter running the parser, and the structure for the end-result
>  > parsed
>  > program. Or we might just use one and squirrel all the interpreter bits
>in
>  > a private (and deletable) namespace somewhere.
>
>It's pretty clear that we're to purposely put in a distinct separation
>between the two, unless I misunderstood Larry on this.

You probably misunderstood a little. I don't think Larry really cares how 
it works as long as it does. If the parser leaves a lot of cruft in the 
_Parser namespace it likely matters not.

>I'm cautious about
>dual-purposing anything here, since he said that this is a major problem
>in Perl 5 today (the lack of flexibility between either end).
>
>I'd like to ask for a clarification of the following terms as they apply
>here:
>
>1. External API

The functions presented to the world at large, including other parts of 
perl. (The bytecode compiler, the optimizer, and the interpreter, specifically)

>2. Internal API

The functions, hooks, and spots for hooks presented to the code inside parser.

>3. Parser

The piece of perl that takes a stream of source and emits a syntax tree.

>4. Interpreter

The piece of perl that takes a chunk of bytecode and executes it.

>5. What seems to be my "toplevel" parser (the creole parser)

Got me there.

>6. Bytecode

Perl's machine code. The stuff that gets fed to the interpreter.

>7. Syntax Tree

The parsed, tokenized, and cleaned-up version of the source. See the dragon 
book (or any good compiler book) for more details.

>And what language they should be in (if Larry's undefined language, just
>say C-Larry or something)

The parser shoud be mostly perl. The rest will be in a mix of something 
Cish and perl. (The Cish stuff will likely be run through a perl filter to 
produce real C, though it'll hopefully have features in it that'll rein in 
some of C's more error-prone features)


Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: To get things started...

2000-11-21 Thread David Grove

Thanks for the clarifications, Simon.

Simon Cozens <[EMAIL PROTECTED]> wrote:

 > On Tue, Nov 21, 2000 at 07:36:11AM -0500, David Grove wrote:
 > >  > 1) The API presented to the rest of the world. This is likely one
 > call,
 > >
 > > These are almost two separate things entirely. (I don't get the "one
 > call"
 > > thing. What do you mean?)
 >
 > A parser does, essentially, one single thing: it takes text and turns
it
 > into
 > an op tree. That's the only call you need to make from an external
 > perspective.

I'm not sure that it's possible to do this, or disirable. If Larry wants
Perl to use different modes, creoles, or ways of interpreting or
understanding the "perl" language, then we have to let the parser have a
bit more information. This includes the ability to tell it what creole
it's currently interpreting (it will probably need a stack for that, since
I can foresee people trying "use tclish" within a "use pythonic", unless
one overrides the other and turns off the previous mode: in which case it
just needs to know its current mode. It also needs to know where to get
its information. If we want a small kernel, then we can't give it
information about how to parse the different modes within the micro-kernel
itself. It would have to be bound to the kernel, or loaded as a file.

If we were simply feeding it perl with a single syntax, we could get away
with a "one call" scheme. But since we're dealing with almost certainly
mutually exclusive syntax and semantics, it probably needs more
information.

As a point of clarification, I am seeing the external parser as that part
of perl that sees the user's script directly.

 > > the external API needs to be flexible to handle perl in different
writing
 > > styles
 >
 > This doesn't need to be the case; the external API may be
 > language-agnostic,
 > with the language rules set by internal calls.

Then I'm misunderstanding the difference between external and internal. If
external touches the user's script, it can't separate itself from whatever
particular syntax is currently in use. The internal portion of the parser
that I suppose I've now proposed is what can be user-syntax independent.
I'm seeing the external api as being the part that receives the user's
script from different modes and turns it into an intermediary ,
and the external as what takes the intermediary  and turns it
into whatever form of output that we've chosen. From what I understand of
Larry's desires for the language, we need multiple possible ways to input,
and multiple possible ways to output, but internally we need that language
agnostic thing.

Maybe I'm going beyond the purpose of "API". Let me know if this is the
case.

 > >  > * The parser needs to be reentrant
 > > No clue what this means. I need this defined in context.
 >
 > While parsing text, you should be able to dive into a separate bit of
text,
 > parse that, ("re-enter" the parser's routines) come out and carry on
 > *exactly*
 > where you left off, without your state being lost.

Thanks. That's basically what it sounded like. Can you give an example? I
mean, are we expressing the need for do {BLOCK} with this, or threads, or
multiplicity, or something else?

 > > perl6  perl5  python  tclish
 > > \  \   /   /
 > >  \  \ /   /
 > >  ---
 > >  READSTDIN and other commons
 > >full tree here
 > >  ---
 > >  |
 > >  | <- required
 > >  |
 > >  ---
 > >   OPCODES
 > >  ---
 > >  / /\  \
 > > / /  \  \
 > > run  store   exea
 > > bc   bc   binaryjava thingy
 >
 > I think you've just invented the compiler! :)

I don't think so. In a compiler I don't believe that the intermediate step
is there, and I've never seen any compiler accept multiple input semantics
and multiple output (meaning binary, bytecode, java, c#) (okay, C++
Builder can accept pascal... but that's an one). However, I've foreseen
the output of compiled code as a part of this. The desire there was to
make it easier to make a compiler, or at least possible to output
executable code as an output mode. Thie exe-binary is actually, of course,
several different pidgins within the creole, since we're outputting to
Linux, Solaris, Win32, etc. ad nauseam...

Keep in mind that I don't have a clear definition for the intermediate
step except that it is desired to separate the external from the internal
as I understand them. (For now, I'm conceptualizing it as a one-to-one
that can change without hurting the internal or external, solely for the
purpose of the desired flexibility and separation.) But then, I've yet to
see whether I'm understanding them. I also realize I'm off the topic of
the bytecode itself, but I'm not sure how much bytecode I can apply to an
undefined language. Can somebody let me know if any of what I've said is
r

Re: To get things started...

2000-11-21 Thread Sam Tregar

On Tue, 21 Nov 2000, David Grove wrote:

> If we were simply feeding it perl with a single syntax, we could get away
> with a "one call" scheme. But since we're dealing with almost certainly
> mutually exclusive syntax and semantics, it probably needs more
> information.

Perhaps the "one call" can take some arguements?  I suppose it would need
to know what kind of syntax to expect.

> Larry's desires for the language, we need multiple possible ways to input,
> and multiple possible ways to output, but internally we need that language
> agnostic thing.

Bytecode, right?

> I don't think so. In a compiler I don't believe that the intermediate step
> is there

It definitely is.  Few optimizations are possible without an intermediate
representation of some kind!

> , and I've never seen any compiler accept multiple input semantics

GCC - recently renamed the "Gnu Compiler Collection" for a reason!

> and multiple output (meaning binary, bytecode, java, c#) (okay, C++

This also not uncommon - you can look at cross-compilers as one example.
Java compilers that can produce bytecode and native code is another.

> Can somebody let me know if any of what I've said is relevant?

Highly relevent, but also somewhat "known".  I think you would be
interested in reading a good book on compiler design.  The dragon-book is
a perenial favorite, although there might be more up-to-date material
available these days.  At least, I hope there is!

-sam





Re: To get things started...

2000-11-21 Thread Simon Cozens

On Tue, Nov 21, 2000 at 10:37:23AM +, David Grove wrote:
> I'm not sure that it's possible to do this, or disirable. If Larry wants
> Perl to use different modes, creoles, or ways of interpreting or
> understanding the "perl" language, then we have to let the parser have a
> bit more information.

Yes, but these don't have to be external level calls.

> As a point of clarification, I am seeing the external parser as that part
> of perl that sees the user's script directly.
 
Likewise.

This is why the "creole" rulesets are *not* external calls.

Syntax definitions, in the form of Perl programs or pragmata, will have to go
through two stages before they can be used. First, they have to be parsed as
Perl code. This is a call to the external API of the parser. 

Once this is done, the resulting op tree must be processed so that it can be
turned into a data structure (representing the grammar) which can be
understood by the parser. 

So, something's gone into the parser, and the parser has determined that this
is a language definition - the parser then passes it off to the
grammar-processor which constructs a grammar for it, and hands the grammar
back to the parser. At this level, it is not "seeing the user's script
directly", and thus I would say that the communication between the
grammar-processor and the parser was an internal level API.

> I don't think so. In a compiler I don't believe that the intermediate step
> is there,

I really *would* recommend Aho, Sethi, Ullman, "Compilers: Principles,
Techniques and Tools".

> and I've never seen any compiler accept multiple input semantics
> and multiple output (meaning binary, bytecode, java, c#) (okay, C++
> Builder can accept pascal... but that's an one).

gcc is a compiler which can receive C, C++, Objective C, and Fortran
input, and produce output for quite an array of architectures.

-- 
A successful [software] tool is one that was used to do something
undreamed of by its author.
-- S. C. Johnson



Re: To get things started...

2000-11-21 Thread Nicholas Clark

On Mon, Nov 20, 2000 at 06:01:52PM -0500, Dan Sugalski wrote:
> * The parser will be written mostly in perl, so you have regexes and such 
> to work with

> * It's possible that the whole set of parsing rules may change on the fly, 
> so don't get hung up on constants like "{"--stick to symbolic things like 
> start_scope instead

A thought strikes me. A few perl constructions ('', "", q(), qq() offhand,
possibly others) can contain embedded newlines.
A regular expression to match "" strings ( /"([^\\"]|\\.)*"/s ) is assuming
that it has all the characters needed to match already in memory.

A parser written in C typically sees the opening " and goes into a loop
munching characters from the input until it meets the closing ". The input
may be line buffered (as in current perl) but if the buffer runs out before
the closing " it is refilled with another line as often as needed.

How is our quoted string matcher going to work in the face of strings
containing embedded literal newlines?

Are we hoping that we can mmap() most scripts, so read isn't hugely a
  problem? And slrp the rest in one? [doesn't feel good]
Are we going to have "lazy scalars" which collude with the regexp engine
  so that if the regexp engine hits the current end more is read from
  the file handle?
Something else?
Or is this no-a-problem for some reason I've not thought of?

Nicholas Clark




Re: To get things started...

2000-11-21 Thread Sam Tregar

On Wed, 22 Nov 2000, Nicholas Clark wrote:

> Are we hoping that we can mmap() most scripts, so read isn't hugely a
>   problem? And slrp the rest in one? [doesn't feel good]
> Are we going to have "lazy scalars" which collude with the regexp engine
>   so that if the regexp engine hits the current end more is read from
>   the file handle?
> Something else?

Perhaps we could add a mode to the regex engine like:

   $filehandle =~ /.../;

Where the engine itself would do the reading and buffering.  Ok, that
might not be such a good idea...  This probably never returns, eh:

   $filehandle =~ /(.*)/;

However we solve the problem I hope we can allow Perl programmers access
to the solution.  This is a very common problem with regex parsers.

-sam





Re: To get things started...

2000-11-21 Thread Dan Sugalski

At 11:45 PM 11/21/00 +, Tom Hughes wrote:
>In message <[EMAIL PROTECTED]>
>   Dan Sugalski <[EMAIL PROTECTED]> wrote:
>
> > At 10:18 AM 11/21/00 -0800, Benjamin Stuhl wrote:
> >
> > >Well, it would (IMHO) make more sense to have
> > >perl6_parse_script (I do tend to follow
> > >{subsystem,verb,object} naming...)
> >
> > Or Perl$parse_script, but that's a matter of taste, I suppose. :)
>
>Given that it isn't a valid C identifier, yes... Unless you're
>using VAXC or DECC of course, which was your point I assume ;-)

Odd. The Dec C docs don't mention it as a problem, and both Dec C on VMS 
and GCC on a linux box take it without complaint. They might've slipped it 
in as valid in the final ANSI standard or something. (I can't dig up my 
ANSI K&R to check, unfortunately)

So it wasn't actually my point, though I'm fine with avoiding $ in 
identifiers, since I expect some platforms will be rather unhappy with it. 
(And other languages may well have restrictions that wouldn't allow it--I 
don't know if COBOL or Fortran are OK with dollar signs...)

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: To get things started...

2000-11-21 Thread Jarkko Hietaniemi

On Tue, Nov 21, 2000 at 09:39:16PM -0500, Dan Sugalski wrote:
> At 11:45 PM 11/21/00 +, Tom Hughes wrote:
> >In message <[EMAIL PROTECTED]>
> >   Dan Sugalski <[EMAIL PROTECTED]> wrote:
> >
> > > At 10:18 AM 11/21/00 -0800, Benjamin Stuhl wrote:
> > >
> > > >Well, it would (IMHO) make more sense to have
> > > >perl6_parse_script (I do tend to follow
> > > >{subsystem,verb,object} naming...)
> > >
> > > Or Perl$parse_script, but that's a matter of taste, I suppose. :)
> >
> >Given that it isn't a valid C identifier, yes... Unless you're
> >using VAXC or DECC of course, which was your point I assume ;-)
> 
> Odd. The Dec C docs don't mention it as a problem, and both Dec C on VMS 
> and GCC on a linux box take it without complaint. They might've slipped it 
> in as valid in the final ANSI standard or something. (I can't dig up my 
> ANSI K&R to check, unfortunately)

Crank up the warnings to strict ANSI and even DEC C moans.  At least on
Digital UNIX it does.

$ cat x.c
static int foo$bar = 42;
$ cc -c -std1 x.c
cc: Warning: x.c, line 1: Extension: A '$' was encountered in an identifier.
static int foo$bar = 42;
---^
$ 

-- 
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen



Re: To get things started...

2000-11-21 Thread David Grove

I'm still not sure where to start from a technical standpoint, so I'll
just comment and brainstorm until someone more used to this tells me
whether my common cents should be in US Dollars or South African ZAR.
Please forgive a bit of rambling, I'm not purposely off topic if I am.


Dan Sugalski <[EMAIL PROTECTED]> wrote:

 > This list is here to design the internal and external API for the 
 > parser/tokenizer/lexer part of perl. Basically we need two bits:
 > 
 > 1) The API presented to the rest of the world. This is likely one call,

 > though if folks want to split it out for external and internal use,
that's 
 > fine.
 > 
 > 2) The internal API. These are the places where hooks can be installed,
or 
 > bits of the parser that those hooks can call back into the parser. (Or 
 > parser/lexer/tokenizer utility routines the hooks can call)

These are almost two separate things entirely. (I don't get the "one call"
thing. What do you mean?) First of all, if we take what Larry said and try
to conceptualize it in terms of a parser, the external API needs to be
flexible to handle perl in different writing styles... creoles I'd call
them, since I think Larry would appreciate that term. (Amateur philologist
here.) The external parser needs to be almost user configurable to
accomplish this. Rather than simple, this is actually quite complex, since
the external api needs to be able to take directions from many creoles and
filter them into something that the internal parser can understand. I
foresee as many mappings to internals in the external parser as the
internal parser has to bytecode in the new perlguts. The external API
needs to know what to map to where, and how. This is where the regexen
basically come in, I think. (Read comments on index() vs regexen below).
The API that I'm seeing, and I'm not particularly inventive in this area,
is a perl hash-type structure mapping regexen to perlguts, where the
particular mappings are determined by pragmata:

use pythonic;
use javanese;
use tclish (:teehee);
use hungarian;
use forth; # drink fifth

I also don't believe that this outer layer needs to be particularly
intelligent when it comes to knowing perl's internals, but I do believe
that it has to have a mind of its own if we're to provide the promised
capabilities of alternate input styles.

$PL_API_EX{'perl6' => 
  {'PRINTCHAR' =>
[OPTYPE_RX, "\Q\bprint\b\s+(\w+\s+)(??{PL_STRING_LIST})\E"]
  }
  {'READSTDIN' =>
[OPTYPE_IX, ""]
  }
}

In this, I'm trying (with extreme and admittedly clumsy effort) to express
that the perl6 (default) creole understand that in order to get to the
PRINTCHAR internal API, it does a regular expression search (with an
embedded function to find the nether end of the print command and use that
as a part of the regex).

Since we're doing this in perl and since we want a small core, this
appears to be a Config.pm type problem, where syntax is defined
externally, either in a module or some type of compiled thingy. Or, maybe
it would be appropriate to go the Linux Kernel route, and decide at
compile time what is in the "kernel" and what is loaded as a "module".
(Hey, that sounds good for some PDD somewhere else).

Now, the internal is actually the less brainy. It basically just needs to
provide a commonality that the external API will connect to when using any
creole. Mapping to bytecodes is beyond my skill when discussing a
theoretical language, however.

I do think that it is important to make the distinction between the
external and internal modules. Larry made it clear that he wanted to
separate these, for flexibility on both ends. (Also good for PDDing, I
think.)

However, one thing is seriously lacking in this theory... if the parser is
perl, how does the perl parse? (Sort of a woodchuck chucking wood type of
thing.) Somehow, the external parser API thingy has to know enough perl
(through the chosen language) to be able to handle the parsing.

To parse this thing, it would seem that we need a third layer... a
C/C++/C-Larry parser (yylex, etc.). Once we have that, we can accomplish
the goals.

[GOALS]
EXTERNAL API:
1. Provide a multi-creole interface as a middleman between the programmer
and his language.
2. Provide a common interface (mapping) between the creole and the
internal API.
3. Write it in Perl.

INTERNAL API:
1. Expose the internal API to be used by the external API for use by the
creoles.
2. Provide a common interface (mapping) between the internal API and the
underlying language.
3. Write it in ...
4. Provide a mapping between the internal bytecodes and either internal
Perl or translation API (the C# and Java thingies)

[PROBLEMS]
1. Figure out how perl is going to parse perl without a perl to parse the
perl with (we need a base parser of some type). The perl "kernel" may need
to be defined as "just enough C to make perl parse". Larry did say that
he'd like to move the c library out of the kernel... We'd need the basic
data structures and regexen, and a basic bootstr

Re: To get things started...

2000-11-21 Thread Simon Cozens

On Tue, Nov 21, 2000 at 07:36:11AM -0500, David Grove wrote:
>  > 1) The API presented to the rest of the world. This is likely one call,
> 
> These are almost two separate things entirely. (I don't get the "one call"
> thing. What do you mean?)

A parser does, essentially, one single thing: it takes text and turns it into
an op tree. That's the only call you need to make from an external
perspective.

> the external API needs to be flexible to handle perl in different writing
> styles

This doesn't need to be the case; the external API may be language-agnostic,
with the language rules set by internal calls.

>  > * The parser needs to be reentrant
> No clue what this means. I need this defined in context.

While parsing text, you should be able to dive into a separate bit of text,
parse that, ("re-enter" the parser's routines) come out and carry on *exactly*
where you left off, without your state being lost.

> perl6  perl5  python  tclish
> \  \   /   /
>  \  \ /   /
>  ---
>  READSTDIN and other commons
>full tree here
>  ---
>  |
>  | <- required
>  |
>  ---
>   OPCODES
>  ---
>  / /\  \
> / /  \  \
> run  store   exea
> bc   bc   binaryjava thingy

I think you've just invented the compiler! :)
 
-- 
It's difficult to see the picture when you are inside the frame.



Re: To get things started...

2000-11-21 Thread Dan Sugalski

At 10:37 AM 11/21/00 +, David Grove wrote:
>Thanks for the clarifications, Simon.
>
>Simon Cozens <[EMAIL PROTECTED]> wrote:
>If we were simply feeding it perl with a single syntax, we could get away
>with a "one call" scheme. But since we're dealing with almost certainly
>mutually exclusive syntax and semantics, it probably needs more
>information.

But we are. The call is probably going to be something like:

   status = parse_perl(perl_interpreter *my_interp,
char *script,
   struct HIR *end_result,
   long flags);

the fact that the script has a "use pythonish;" in it is entirely 
irrelevant--the program calls into the parser, which returns a status and 
possibly a parsed representation of the program. The parser gets to deal 
with all the grotty details.

>As a point of clarification, I am seeing the external parser as that part
>of perl that sees the user's script directly.

Mostly directly. There'll probably still be a level of indirection there, 
since we need to take into account embedding programs that may do odd things.

>  > > the external API needs to be flexible to handle perl in different
>writing
>  > > styles
>  >
>  > This doesn't need to be the case; the external API may be
>  > language-agnostic,
>  > with the language rules set by internal calls.
>
>Then I'm misunderstanding the difference between external and internal.

Yup, I think so, but that's OK.

The internal API also doesn't need to care about a lot of language-level 
stuff either. We need to take into account the fact that the rules on 
what's a scalar (or a block start, or comment, or whatever) may be dynamic, 
but the parse_token call is the parse_token call, regardless of the rules 
in effect.

>  If
>external touches the user's script, it can't separate itself from whatever
>particular syntax is currently in use.

Sure it can. It *must*, otherwise we'd need to rewrite parts of the parser 
every time we added another language variant. I don't want to have to 
rebuild perl just to use python mode. Yech.

>Maybe I'm going beyond the purpose of "API". Let me know if this is the
>case.

Yup. You've got the language rules mixed in with the API. Separate beast, 
and one we're not dealing with here.

>  > > perl6  perl5  python  tclish
>  > > \  \   /   /
>  > >  \  \ /   /
>  > >  ---
>  > >  READSTDIN and other commons
>  > >full tree here
>  > >  ---
>  > >  |
>  > >  | <- required
>  > >  |
>  > >  ---
>  > >   OPCODES
>  > >  ---
>  > >  / /\  \
>  > > / /  \  \
>  > > run  store   exea
>  > > bc   bc   binaryjava thingy
>  >
>  > I think you've just invented the compiler! :)
>
>I don't think so. In a compiler I don't believe that the intermediate step
>is there, and I've never seen any compiler accept multiple input semantics
>and multiple output (meaning binary, bytecode, java, c#))

Pretty much everyone's compiler does this at this point. Gcc's already been 
mentioned, but Dec's compiler suites for VAX and Alpha do the same thing, 
and from the literature it looks like other folks do it too. There's a 
custom front end that produces an intermediate representation, and a common 
IR->object optimizing back-end.



Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: To get things started...

2000-11-21 Thread Dan Sugalski

At 07:36 AM 11/21/00 -0500, David Grove wrote:
>However, one thing is seriously lacking in this theory... if the parser is
>perl, how does the perl parse? (Sort of a woodchuck chucking wood type of
>thing.) Somehow, the external parser API thingy has to know enough perl
>(through the chosen language) to be able to handle the parsing.

Nope. We do it in two phases. The end result will not actually parse perl 
code to build the parser (we'll provide bytecode for that) but to start we 
can run the parser through perl 5 to get a syntax tree until the perl 6 
engine's capable of doing it itself.

>[GOALS]
>EXTERNAL API:
>1. Provide a multi-creole interface as a middleman between the programmer
>and his language.
>2. Provide a common interface (mapping) between the creole and the
>internal API.
>3. Write it in Perl.

Yup.

>INTERNAL API:
>1. Expose the internal API to be used by the external API for use by the
>creoles.
>2. Provide a common interface (mapping) between the internal API and the
>underlying language.
>3. Write it in ...

Yup.

>4. Provide a mapping between the internal bytecodes and either internal
>Perl or translation API (the C# and Java thingies)

Nope. The syntax-tree to bytecode converter's a separate piece.

>  > The general rules of the game are:
>  >
>  > * The parser will be written mostly in perl, so you have regexes and
>such
>  > to work with
>
>To quote my perl elders, whatever can be done without regexen should be
>done with index() (within limits, since some regexen can be quite
>optimized).

No, not really. regexes are generally easier to comprehend than their index 
couterparts, and often faster. (There's a lot of code that needs to go into 
backtracking...) While index might be better sometimes we can't force folks 
to use it. Almost all of perl is up for grabs.

>The parser API needs to know both regexen and index() in order to work.

The parser will have a fully-functional interpreter to work with. All of 
perl will likely be there for it. (Modules and threads might not, but 
that's still up in the air)

>  > * The parser will have an active interpreter structure handy
>
>Is this the perl that parses the perl?

Yup. In fact we might have two--the interpreter structure for the 
interpreter running the parser, and the structure for the end-result parsed 
program. Or we might just use one and squirrel all the interpreter bits in 
a private (and deletable) namespace somewhere.

>Yeah, I'm going over this in order.
>Maybe I should have read the whole thing first.

Nah, y'think? :-P

>  > * The parser needs to be reentrant
>
>No clue what this means. I need this defined in context.

The parser needs to be able to call back into itself without screwing 
things up. Very little global state, in other words.

>  > * The ultimate output of the parser will be a syntax tree
>
>I think I said that.

More or less. Perl will probably have two different intermediate 
representations, the parsed syntax tree and bytecode. The parser only spits 
out the syntax tree.


Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: To get things started...

2000-11-21 Thread Benjamin Stuhl

--- Dan Sugalski <[EMAIL PROTECTED]> wrote:
> At 10:37 AM 11/21/00 +, David Grove wrote:
> >Thanks for the clarifications, Simon.
> >
> >Simon Cozens <[EMAIL PROTECTED]> wrote:
> >If we were simply feeding it perl with a single syntax,
> we could get away
> >with a "one call" scheme. But since we're dealing with
> almost certainly
> >mutually exclusive syntax and semantics, it probably
> needs more
> >information.
> 
> But we are. The call is probably going to be something
> like:
> 
>status = parse_perl(perl_interpreter *my_interp,
> char *script,
>struct HIR *end_result,
>long flags);

Well, it would (IMHO) make more sense to have
perl6_parse_script (I do tend to follow
{subsystem,verb,object} naming...) take a PerlIO*, so that
it is completely transparent parsing from a file or a
string. This gets almost into embedding issues, though (how
much of a libc does perl6 really need? perl5 now carries
large chunks of one around with it).

> the fact that the script has a "use pythonish;" in it is
> entirely 
> irrelevant--the program calls into the parser, which
> returns a status and 
> possibly a parsed representation of the program. The
> parser gets to deal 
> with all the grotty details.
> 
[snip]

-- BKS

__
Do You Yahoo!?
Yahoo! Shopping - Thousands of Stores. Millions of Products.
http://shopping.yahoo.com/



Re: To get things started...

2000-11-21 Thread Dan Sugalski

At 10:18 AM 11/21/00 -0800, Benjamin Stuhl wrote:
>--- Dan Sugalski <[EMAIL PROTECTED]> wrote:
> > At 10:37 AM 11/21/00 +, David Grove wrote:
> > >Thanks for the clarifications, Simon.
> > >
> > >Simon Cozens <[EMAIL PROTECTED]> wrote:
> > >If we were simply feeding it perl with a single syntax,
> > we could get away
> > >with a "one call" scheme. But since we're dealing with
> > almost certainly
> > >mutually exclusive syntax and semantics, it probably
> > needs more
> > >information.
> >
> > But we are. The call is probably going to be something
> > like:
> >
> >status = parse_perl(perl_interpreter *my_interp,
> > char *script,
> >struct HIR *end_result,
> >long flags);
>
>Well, it would (IMHO) make more sense to have
>perl6_parse_script (I do tend to follow
>{subsystem,verb,object} naming...)

Or Perl$parse_script, but that's a matter of taste, I suppose. :)

>  take a PerlIO*, so that
>it is completely transparent parsing from a file or a
>string. This gets almost into embedding issues, though (how
>much of a libc does perl6 really need? perl5 now carries
>large chunks of one around with it).

I'm not sure we want a PerlIO* passed in, for embedding reasons. I can see 
embedding programs wanting to pass in a pointer to a string with the whole 
script, a filehandle of some sort, or a pointer to a function that produces 
the script in really odd cases. (Possibly with a second pointer in that 
case to misc data)

Maybe something like:

perl_parse(interp *interp,
   void *script,
   void *extra,
   syntree *parsed_perl,
   int flags);

where the flags arg indicates what sort of thing the script pointer is. Or 
perhaps:

perl_parse(interp *interp,
   void *script,
   syntree *parsed_perl,
   int flags,
   void *extra);

with the extra pointer and the flags argument vararg'd into optionality. 
(Defaulting to NULL and 0, respectively)



Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: To get things started...

2000-11-21 Thread David Grove


Dan Sugalski <[EMAIL PROTECTED]> wrote:

 > At 10:37 AM 11/21/00 +, David Grove wrote:
 > >Thanks for the clarifications, Simon.
 > >
 > >Simon Cozens <[EMAIL PROTECTED]> wrote:
 > >If we were simply feeding it perl with a single syntax, we could get
away
 > >with a "one call" scheme. But since we're dealing with almost
certainly
 > >mutually exclusive syntax and semantics, it probably needs more
 > >information.
 >
 > But we are. The call is probably going to be something like:
 >
 >status = parse_perl(perl_interpreter *my_interp,
 > char *script,
 >struct HIR *end_result,
 >long flags);
 >
 > the fact that the script has a "use pythonish;" in it is entirely
 > irrelevant--the program calls into the parser, which returns a status
and
 > possibly a parsed representation of the program. The parser gets to
deal
 > with all the grotty details.

What form is this intermediary parsed representation in? API, right? Then
I need to clarify that when I say bytecode, I've meant whatever this
intermediary parsed representation is, be that pure perl, API, or
otherwise.

p