On Mon, Nov 20, 2000 at 06:01:52PM -0500, Dan Sugalski wrote:
> * The parser will be written mostly in perl, so you have regexes and such 
> to work with

> * It's possible that the whole set of parsing rules may change on the fly, 
> so don't get hung up on constants like "{"--stick to symbolic things like 
> start_scope instead

A thought strikes me. A few perl constructions ('', "", q(), qq() offhand,
possibly others) can contain embedded newlines.
A regular expression to match "" strings ( /"([^\\"]|\\.)*"/s ) is assuming
that it has all the characters needed to match already in memory.

A parser written in C typically sees the opening " and goes into a loop
munching characters from the input until it meets the closing ". The input
may be line buffered (as in current perl) but if the buffer runs out before
the closing " it is refilled with another line as often as needed.

How is our quoted string matcher going to work in the face of strings
containing embedded literal newlines?

Are we hoping that we can mmap() most scripts, so read isn't hugely a
  problem? And sluuuurp the rest in one? [doesn't feel good]
Are we going to have "lazy scalars" which collude with the regexp engine
  so that if the regexp engine hits the current end more is read from
  the file handle?
Something else?
Or is this no-a-problem for some reason I've not thought of?

Nicholas Clark

Reply via email to