On Tue, Jul 05, 2005 at 08:42:44AM -0500, Patrick R. Michaud wrote:
: In short, when PGE's
: parser encounters a code block, it needs to hand off control to
: the target language's compiler to parse to the end of the
: code block and receive back from that compiler the length of
: the block parsed.

That is the preferred way.

: Or, we try to do something along the lines of Text::Balanced and
: find the closing brace by doing simple delimiter counting...

Perl 5 would have done it with two passes.  Which is why Perl 6 is
specced to always do one-pass parsing instead.  :-)

One wrinkle of one-pass is that the parser must somehow know where
to quit.  It's fine if you know you want a block and can call a rule
that automatically terminates on '}', or if your rule reliably gets an
error at the point it should stop.  But in the general case you might
want to be able to pass in a set of terminating delimiters that stop
the outermost parse even if it looks like it should otherwise continue.

: ...Perl 6 brings in several new delimiters that have to be taken into
: account in the balancing act.

These should generally be recognizable from Unicode properties, but
that's not a good reason to take the balanced approach.  And as soon
as people start defining their own circumfix operators that violate
the Unicode properties, all bets are off.  Even a user-defined quote
is going to cause grief unless you know the / of xxx/.../ is an opener.

For languages that cannot do one-pass parsing, it would be saner in
the long run for rules to delimit such code with delimiters that
are unlikely to occur in the target language.  Double curlies, or
here docs, or some such.  That's hacky, but counting brackets is
also hacky.  Any time you write two different parsers for the same
language, it's a new set of bugs just waiting to happen.

Larry

Reply via email to