On Tue, Jul 05, 2005 at 08:51:39AM -0700, Larry Wall wrote:
> 
> For languages that cannot do one-pass parsing, it would be saner in
> the long run for rules to delimit such code with delimiters that
> are unlikely to occur in the target language.  Double curlies, or
> here docs, or some such.  That's hacky, but counting brackets is
> also hacky.  

Any chance we could identify such a set of delimiters and standardize
them within the rules language, or at least within PGE?  

In the general case, Parrot's "compile" opcode doesn't (yet?) have
the interface we'd really need or want in order to do the "parse 
a program up to a closing delimiter and return the result" sort 
of thing that we'd need.  And if a target language is more of
an interpreter and not a compiler, then it might need a flag that
indicates "just parse this, don't execute it yet".  Maybe we
need a "parse" opcode, or a standard parse function-call interface
that language parsers use to interface with PGE.

It could also be that one might wish to avoid parsing and/or 
compiling a code block until it's actually encountered during 
execution of the rule.

Perl 6 rightfully grabs { ... } for its code blocks, and it's reasonable
to expect the rules engine to have some intimate knowledge of the
Perl 6 parser (and vice versa).  But from a more general "tool"
perspective it seems like it'd be nice to have a set of delimiters 
available for codeblocks that the rules parser could use without 
having to communicate with another parser to handle it.  Even if 
the chosen delimiters aren't a 100% solution, if they manage to cover
a wide swath of the most common language syntaxes I think it'd be
a win for most language and tool developers.

Technically, one could conceivably use the string-argument form of
subrules (mentioned in A05) to achieve this:

     \d+  <tcl: ...tcl-code-here...>

but we'd have to have some mechanism to escape any '>' characters
in the string argument.  And that could get pretty nasty, and we
haven't really spec'd out the string delimiters here yet (do all
backslash-escapes get processed)?

In the more general case it seems like it'd be really nice to
have a generalized delimiter available, such as double-curlies,
angle+double-curlies, or something that could be extracted without
resorting to a language-specific parser to find the end of the code
block:

    rx :code('tcl')  / \d+ {{ ...codeblock source... }} /
    rx :code('tcl')  / \d+ <{{ ...codeblock source... }}> /
    rx / \d+ <tcl{{ ...codeblock source... }}> /

Some other ideas are at the end of this message.

It might also be worth mentioning that the format of the opening
delimiter isn't terribly important -- it's just a closing delimiter token
that PGE or a rules engine needs to scan for; and hopefully that closing
delimiter is something that is unlikely to ever appear in a code block
for a target language (or can be easily worked around when it does).

Anyway, if this is really chasing down a dead-end, just say so and
we'll go with the other approach.  But I feel like it simplifies things
a lot (both for PGE and for compiler authors) if we can have a 
generalized code block delimiter available.

Pm

Some other random syntax possibilities--using chars not yet defined
for angles:

    <| ... code block ... |>
    <* ... code block ... *>
    <` ... code block ... `>
    <^ ... code block ... ^>
    <~ ... code block ... ~>

Thus far I think I like {{...}}, <{{...}}>, or <`...`> the best.

Pm

Reply via email to