At 12:29 AM 11/25/00 -0500, Sam Tregar wrote:
>On Fri, 24 Nov 2000, Nicholas Clark wrote:
>
> > I think Dan was suggesting that the (user side) regex doesn't change at all
> > (so that's no new syntax there)
> > It's just that the innards of perl gains a tied scalar that doesn't
> actually
> > read in and buffer the file immediately, but defers it as long as it
> can get
> > away with. And that the regex engine knows about these lazy scalars and
> > provokes the read-more when needed.
>
>Right. And I was suggesting that while this might solve our problem it
>wouldn't do much for all the other people that have to solve the same
>problem. I'd like to see a general solution accessible from Perl. If
>that solution is some tied-scalar magic, fine. If it's more involved than
>that (and I think it will be) then we'll need to think about the syntax a
>bit.
If it solves our problem, that's just fine. And since the parser will be
mostly in perl, it means that this particular solution will be available to
everyone.
my file_tie $file_data : name "source.pl";
or something like that.
> > I don't think that this differs from the current parser. If it encounters
> > open " but never a close ", it will read and buffer to the end of file
> > before realising that there's a problem. (because strictly there isn't
> > a problem until EOF is encountered before the closing ")
> >
> > I'm not certain there's anything that can actually be done to avert the
> need
> > to buffer a lot of script in these situations. You mustn't attempt to seek
> > the script file handle as it might be from something unseekable such as a
> > pipe (or socket. BEGIN {socket STDIN...})
>
>I suppose that's true. I was immagining something less extreme than the
>absolute failure case of missing a closing ". I'm imagining a failure
>that is recoverable but still requires running the regex to the end of the
>"string" to find that out. Are there any like this? Perhaps not.
I don't think there is, really. You might be able to recover from the loss
of a closing parenthesis if you can assume that the statement close would
close up a paren. I'm thinking something like:
$foo = bar(12;
where perl could guess that the missing paren's just before the semi-colon.
Anything past that would probably be really dodgy--if there were multiple
parens there, where would you put the missing one?
>Perhaps this just isn't a reasonable criticism of regex parsers since
>normal parsers do it all the time anyway!
It's certainly a reasonable criticism of parsers in general, and a good one
to keep in mind with regex based parsers. It's easier to overdo it with a
bad regex than it is to crock your character-by-character hardcoded state
machine.
Dan
--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
[EMAIL PROTECTED] have teddy bears and even
teddy bears get drunk