Re: To get things started...

David Grove Tue, 21 Nov 2000 04:27:00 -0800
I'm still not sure where to start from a technical standpoint, so I'll
just comment and brainstorm until someone more used to this tells me
whether my common cents should be in US Dollars or South African ZAR.
Please forgive a bit of rambling, I'm not purposely off topic if I am.


Dan Sugalski <[EMAIL PROTECTED]> wrote:

 > This list is here to design the internal and external API for the 
 > parser/tokenizer/lexer part of perl. Basically we need two bits:
 > 
 > 1) The API presented to the rest of the world. This is likely one call,

 > though if folks want to split it out for external and internal use,
that's 
 > fine.
 > 
 > 2) The internal API. These are the places where hooks can be installed,
or 
 > bits of the parser that those hooks can call back into the parser. (Or 
 > parser/lexer/tokenizer utility routines the hooks can call)

These are almost two separate things entirely. (I don't get the "one call"
thing. What do you mean?) First of all, if we take what Larry said and try
to conceptualize it in terms of a parser, the external API needs to be
flexible to handle perl in different writing styles... creoles I'd call
them, since I think Larry would appreciate that term. (Amateur philologist
here.) The external parser needs to be almost user configurable to
accomplish this. Rather than simple, this is actually quite complex, since
the external api needs to be able to take directions from many creoles and
filter them into something that the internal parser can understand. I
foresee as many mappings to internals in the external parser as the
internal parser has to bytecode in the new perlguts. The external API
needs to know what to map to where, and how. This is where the regexen
basically come in, I think. (Read comments on index() vs regexen below).
The API that I'm seeing, and I'm not particularly inventive in this area,
is a perl hash-type structure mapping regexen to perlguts, where the
particular mappings are determined by pragmata:

use pythonic;
use javanese;
use tclish (:teehee);
use hungarian;
use forth; # drink fifth

I also don't believe that this outer layer needs to be particularly
intelligent when it comes to knowing perl's internals, but I do believe
that it has to have a mind of its own if we're to provide the promised
capabilities of alternate input styles.

$PL_API_EX{'perl6' => 
  {'PRINTCHAR' =>
    [OPTYPE_RX, "\Q\bprint\b\s+(\w+\s+)(??{PL_STRING_LIST})\E"]
  }
  {'READSTDIN' =>
    [OPTYPE_IX, "<STDIN>"]
  }
}

In this, I'm trying (with extreme and admittedly clumsy effort) to express
that the perl6 (default) creole understand that in order to get to the
PRINTCHAR internal API, it does a regular expression search (with an
embedded function to find the nether end of the print command and use that
as a part of the regex).

Since we're doing this in perl and since we want a small core, this
appears to be a Config.pm type problem, where syntax is defined
externally, either in a module or some type of compiled thingy. Or, maybe
it would be appropriate to go the Linux Kernel route, and decide at
compile time what is in the "kernel" and what is loaded as a "module".
(Hey, that sounds good for some PDD somewhere else).

Now, the internal is actually the less brainy. It basically just needs to
provide a commonality that the external API will connect to when using any
creole. Mapping to bytecodes is beyond my skill when discussing a
theoretical language, however.

I do think that it is important to make the distinction between the
external and internal modules. Larry made it clear that he wanted to
separate these, for flexibility on both ends. (Also good for PDDing, I
think.)

However, one thing is seriously lacking in this theory... if the parser is
perl, how does the perl parse? (Sort of a woodchuck chucking wood type of
thing.) Somehow, the external parser API thingy has to know enough perl
(through the chosen language) to be able to handle the parsing.

To parse this thing, it would seem that we need a third layer... a
C/C++/C-Larry parser (yylex, etc.). Once we have that, we can accomplish
the goals.

[GOALS]
EXTERNAL API:
1. Provide a multi-creole interface as a middleman between the programmer
and his language.
2. Provide a common interface (mapping) between the creole and the
internal API.
3. Write it in Perl.

INTERNAL API:
1. Expose the internal API to be used by the external API for use by the
creoles.
2. Provide a common interface (mapping) between the internal API and the
underlying language.
3. Write it in ...
4. Provide a mapping between the internal bytecodes and either internal
Perl or translation API (the C# and Java thingies)

[PROBLEMS]
1. Figure out how perl is going to parse perl without a perl to parse the
perl with (we need a base parser of some type). The perl "kernel" may need
to be defined as "just enough C to make perl parse". Larry did say that
he'd like to move the c library out of the kernel... We'd need the basic
data structures and regexen, and a basic bootstrap mechanism. The core
functions (perl standard internal functions) need to know when they're
called though, but perl can do that... don't want to #include <stdio.pm>,
that's ikky (and won't do perl5 scripts).

[EXCLAMATION]
Hey, I'm just brainstorming here. Laugh if you need to...

 > The general rules of the game are:
 > 
 > * The parser will be written mostly in perl, so you have regexes and
such 
 > to work with

To quote my perl elders, whatever can be done without regexen should be
done with index() (within limits, since some regexen can be quite
optimized).

The parser API needs to know both regexen and index() in order to work.
One of Python's power points (don't ask) is that they think that they are
more of an application programming language. I foresee Perl 6 tring to fit
into that niche. Perl programs can be expected, or should be allowed, to
grow quite large, much more so than they are today. The more we do with
regexen, the longer the compile process.

 > * The parser will have an active interpreter structure handy

Is this the perl that parses the perl? Yeah, I'm going over this in order.
Maybe I should have read the whole thing first.

 > * The parser needs to be reentrant

No clue what this means. I need this defined in context.

 > * The ultimate output of the parser will be a syntax tree

I think I said that. Maybe I'm thinking good thinks. I'm thinking like
this (forgive my monotype font if you read mail with arial or times or
whatever)

perl6  perl5  python  tclish
\      \           /       /
 \      \         /       /
 ---------------------------
 READSTDIN and other commons
       full tree here
 ---------------------------
             |
             | <- required
             |
 ---------------------------
          OPCODES
 ---------------------------
 /     /            \      \
/     /              \      \
run  store           exe    a
bc   bc           binary    java thingy

Again, the separation between the commonality and the opcodes is something
that Larry wants for flexibility in either end. And Larry said, let there
be a firmament in the midst of the API's, to separate the API's above from
the API's below, so we can fiddle with either end without making airy mud.

In this model, both the external and internal api's are replaceable by
creoles. (Hmmm, a possibility for a PDD?) We want to output either
bytecode or java. I think this goes at the internal level as a replaceable
model. The gist is, what goes in can change, and what comes out can
change, depending on what creole you want on either end. I'm thinking of
the outputs... either runnable, or storable, or translated, or compiled,
etc.

 > * It's OK to store stuff in the interpreter structure. (Which will have
 > the 
 > full perl global variable stash in it)

If we couldn't, we couldn't change creoles.

 > * The keywords the parser uses may be changed on the fly, and the
parser 
 > needs to handle this.

Hah, saw this coming. Maybe I'm doing this right after all.

 > * It's possible that the whole set of parsing rules may change on the
fly, 
 > so don't get hung up on constants like "{"--stick to symbolic things
like 
 > start_scope instead

Nope, I'm off. But I think I have a good model anyways.
 
 > It's distinctly possible the deadline will need to be extended, or that
 > the resulting PDD will be really simple. Either's OK.

At least until I figure out what we want here.

 > If someone else wants to hold the WG chair that's just fine with me. I
 > will 
 > add that parsing/lexing/tokenizing in general, and in the way perl
needs 
 > it, is *not* anywhere near my area of expertise. While I can specify
the 
 > external and inter-module interface just fine, the actual internals of
the 
 > parser aren't my thing. (Not that it's necessary for handling the group

 > chair, but if there's someone with both interest and expertise...)

I'd need to get a clue...

Did anybody get my pun with the "tclish"?

p
Re: To get things started...

Reply via email to