Tom Hughes wrote:
> 
> In message <[EMAIL PROTECTED]>
>           Simon Cozens <[EMAIL PROTECTED]> wrote:
> 
> > In a sense, though, you're right; this is a general problem. I'm currently
> > trying to work out a design for a tokeniser, and it seems to me that
> > there's going to be a lot of communicating of "hints" between the
> > tokeniser, the lexer and the parser.
> 
> You have to be vary careful about downward communication from the
> parser to the lexer if there's any lookahead involved as you can
> find that you're trying to affect the lexing of tokens which are
> already in the lookahead buffer of the parser.
> 
> Backtracking may well be better than lookahead here as you can always
> jump back a bit after you change the lexer's state ;-)

The difference may be more illusory than real. In either case, you have
to undo something: your recursion stack for backtracking, some parser
state for lookahead. And in both cases, undoing those things is a hell
of a lot easier than undoing the code that was run when you recognized
(or thought you recognized) some chunk of tokens as an anonymous sub or
whatever. For example, say you stuck an entry for the subroutine into
the package's symbol table. You'd better have kept the original, in case
you were wrong -- but you might not want to keep all originals, or
you'll blow your memory. Perhaps we can avoid doing anything significant
during parsing (when will BEGIN{} run?), but perhaps not.

Handling the parser's state can be done in a backtracking DFA-like or a
direct NFA-like way. The NFA way is to keep track of all possible parse
states and advance each one in parallel based on the next token. The DFA
way is recursive descent, backing out of blind alleys and trying again,
keeping a single working hypothesis alive at a time. The DFA approach is
probably easier to undo user code in, because in the NFA case you have
to consider each token under the assumptions of all possible parses up
to that point. The NFA case has the advantage that you never have to
back up, so you can permanently forget about a token as soon as it
whizzes by.

Perl5 is parseable with a single token of lookahead and lots of
parser/lexer communication. Sort of. It would be nice to prevent it from
getting any worse. We could pretend to support full DWIMmery by telling
the user when it fails:

10 print foo bar();
13 sub foo { ... }

DWIMmery badness 10000: Sorry, but I screwed up by assuming 'foo' was a
direct object in line 10, and only found out on line 13. Would you mind
predeclaring 'foo' somewhere before line 10?

...but that would be weird.

print foo bar();
eval "sub foo { $code }";
print foo bar();

Reply via email to