Re: regex over files

Robin Becker Tue, 26 Apr 2005 11:34:49 -0700

Skip Montanaro wrote:

    Robin> So we avoid dirty page writes etc etc. However, I still think I
    Robin> could get away with a small window into the file which would be
    Robin> more efficient.

It's hard to imagine how sliding a small window onto a file within Python
would be more efficient than the operating system's paging system. ;-)

Skip

well it might be if I only want to scan forward through the file (think lexical analysis). Most lexical analyzers use a buffer and produce a stream of tokens ie a compressed version of the input. There are problems crossing buffers etc, but we never normally need the whole file in memory.

If the lexical analyzer reads the whole file into memory then we need more pages. The mmap thing might help as we need only read pages (for a lexical scanner).

Scanners work by detecting the transitions between tokens so even if the tokens are very long we don't need to store them twice (in the input stream and token accumulator); I suppose that could be true of regex pattern matchers, but it doesn't seem to be for re ie we need the entire pattern in the input before we can match and extract to an accumulator. -- Robin Becker

--
http://mail.python.org/mailman/listinfo/python-list

Re: regex over files

Reply via email to