On Tue, Jul 05, 2005 at 08:51:39AM -0700, Larry Wall wrote: > > For languages that cannot do one-pass parsing, it would be saner in > the long run for rules to delimit such code with delimiters that > are unlikely to occur in the target language. Double curlies, or > here docs, or some such. That's hacky, but counting brackets is > also hacky.
Any chance we could identify such a set of delimiters and standardize them within the rules language, or at least within PGE? In the general case, Parrot's "compile" opcode doesn't (yet?) have the interface we'd really need or want in order to do the "parse a program up to a closing delimiter and return the result" sort of thing that we'd need. And if a target language is more of an interpreter and not a compiler, then it might need a flag that indicates "just parse this, don't execute it yet". Maybe we need a "parse" opcode, or a standard parse function-call interface that language parsers use to interface with PGE. It could also be that one might wish to avoid parsing and/or compiling a code block until it's actually encountered during execution of the rule. Perl 6 rightfully grabs { ... } for its code blocks, and it's reasonable to expect the rules engine to have some intimate knowledge of the Perl 6 parser (and vice versa). But from a more general "tool" perspective it seems like it'd be nice to have a set of delimiters available for codeblocks that the rules parser could use without having to communicate with another parser to handle it. Even if the chosen delimiters aren't a 100% solution, if they manage to cover a wide swath of the most common language syntaxes I think it'd be a win for most language and tool developers. Technically, one could conceivably use the string-argument form of subrules (mentioned in A05) to achieve this: \d+ <tcl: ...tcl-code-here...> but we'd have to have some mechanism to escape any '>' characters in the string argument. And that could get pretty nasty, and we haven't really spec'd out the string delimiters here yet (do all backslash-escapes get processed)? In the more general case it seems like it'd be really nice to have a generalized delimiter available, such as double-curlies, angle+double-curlies, or something that could be extracted without resorting to a language-specific parser to find the end of the code block: rx :code('tcl') / \d+ {{ ...codeblock source... }} / rx :code('tcl') / \d+ <{{ ...codeblock source... }}> / rx / \d+ <tcl{{ ...codeblock source... }}> / Some other ideas are at the end of this message. It might also be worth mentioning that the format of the opening delimiter isn't terribly important -- it's just a closing delimiter token that PGE or a rules engine needs to scan for; and hopefully that closing delimiter is something that is unlikely to ever appear in a code block for a target language (or can be easily worked around when it does). Anyway, if this is really chasing down a dead-end, just say so and we'll go with the other approach. But I feel like it simplifies things a lot (both for PGE and for compiler authors) if we can have a generalized code block delimiter available. Pm Some other random syntax possibilities--using chars not yet defined for angles: <| ... code block ... |> <* ... code block ... *> <` ... code block ... `> <^ ... code block ... ^> <~ ... code block ... ~> Thus far I think I like {{...}}, <{{...}}>, or <`...`> the best. Pm