On Tue, Jul 05, 2005 at 12:45:11PM -0500, Patrick R. Michaud wrote: : On Tue, Jul 05, 2005 at 08:51:39AM -0700, Larry Wall wrote: : > : > For languages that cannot do one-pass parsing, it would be saner in : > the long run for rules to delimit such code with delimiters that : > are unlikely to occur in the target language. Double curlies, or : > here docs, or some such. That's hacky, but counting brackets is : > also hacky. : : Any chance we could identify such a set of delimiters and standardize : them within the rules language, or at least within PGE?
Sure. For the moment it's still our language to do however we like. : In the general case, Parrot's "compile" opcode doesn't (yet?) have : the interface we'd really need or want in order to do the "parse : a program up to a closing delimiter and return the result" sort : of thing that we'd need. And if a target language is more of : an interpreter and not a compiler, then it might need a flag that : indicates "just parse this, don't execute it yet". Maybe we : need a "parse" opcode, or a standard parse function-call interface : that language parsers use to interface with PGE. Some languages could have difficulty with that, I suppose. : It could also be that one might wish to avoid parsing and/or : compiling a code block until it's actually encountered during : execution of the rule. Hmm. That's probably more in the province of { FooLang.eval("...") }, assuming FooLang doesn't have its own eval. : Perl 6 rightfully grabs { ... } for its code blocks, and it's reasonable : to expect the rules engine to have some intimate knowledge of the : Perl 6 parser (and vice versa). But from a more general "tool" : perspective it seems like it'd be nice to have a set of delimiters : available for codeblocks that the rules parser could use without : having to communicate with another parser to handle it. Even if : the chosen delimiters aren't a 100% solution, if they manage to cover : a wide swath of the most common language syntaxes I think it'd be : a win for most language and tool developers. : : Technically, one could conceivably use the string-argument form of : subrules (mentioned in A05) to achieve this: : : \d+ <tcl: ...tcl-code-here...> : : but we'd have to have some mechanism to escape any '>' characters : in the string argument. And that could get pretty nasty, and we : haven't really spec'd out the string delimiters here yet (do all : backslash-escapes get processed)? Yes, avoiding that kind of ambiguity is precisely why the Perl call variant requires parens: <foo(...)>. It would be yucky to reintroduce it on behalf of other languages. But, hey... :-) : In the more general case it seems like it'd be really nice to : have a generalized delimiter available, such as double-curlies, : angle+double-curlies, or something that could be extracted without : resorting to a language-specific parser to find the end of the code : block: : : rx :code('tcl') / \d+ {{ ...codeblock source... }} / : rx :code('tcl') / \d+ <{{ ...codeblock source... }}> / : rx / \d+ <tcl{{ ...codeblock source... }}> / : : Some other ideas are at the end of this message. I think I'd rather see a :lang('tcl') option since :c is taken and :l isn't. But mostly people will want to put use rule :lang<tcl>; or some such at the beginning of the file, since all the rule actions are likely to be in the same language. : It might also be worth mentioning that the format of the opening : delimiter isn't terribly important -- it's just a closing delimiter token : that PGE or a rules engine needs to scan for; and hopefully that closing : delimiter is something that is unlikely to ever appear in a code block : for a target language (or can be easily worked around when it does). But from a human point of view it would be nice if it feels "nesty". : Anyway, if this is really chasing down a dead-end, just say so and : we'll go with the other approach. But I feel like it simplifies things : a lot (both for PGE and for compiler authors) if we can have a : generalized code block delimiter available. We need to pursue both options. Languages with left-to-right parsers aren't going to want to put {{...}} everywhere. : Pm : : Some other random syntax possibilities--using chars not yet defined : for angles: : : <| ... code block ... |> : <* ... code block ... *> : <` ... code block ... `> : <^ ... code block ... ^> : <~ ... code block ... ~> : : Thus far I think I like {{...}}, <{{...}}>, or <`...`> the best. I'd lean more toward the infinitely extensible PODly technique: {...} {{...}} {{{...}}} {{{{...}}}} ... In any case, we probably want the outermost delimiters to be curlies so that we can use these anywhere we allow Perl closures, such as \d**{{ (range 1 5) }}. Though perhaps Lisp is a bad example of a bad language. :-) Larry