On Tue, 26 Apr 2005 19:32:29 +0100, Robin Becker wrote: > Skip Montanaro wrote: >> Robin> So we avoid dirty page writes etc etc. However, I still think I >> Robin> could get away with a small window into the file which would be >> Robin> more efficient. >> >> It's hard to imagine how sliding a small window onto a file within Python >> would be more efficient than the operating system's paging system. ;-) >> >> Skip > well it might be if I only want to scan forward through the file (think > lexical > analysis). Most lexical analyzers use a buffer and produce a stream of tokens > ie > a compressed version of the input. There are problems crossing buffers etc, > but > we never normally need the whole file in memory.
I think you might have a misunderstanding here. mmap puts a file into *virtual* memory. It does *not* read the whole thing into physical memory; if it did, there would be no purpose to mmap support in the OS in the first place, as a thin wrapper around existing file calls would work. > If the lexical analyzer reads the whole file into memory then we need more > pages. The mmap thing might help as we need only read pages (for a lexical > scanner). The read-write status of the pages is not why mmap is an advantage; the advantage is that the OS naturally and transparent is taking care of loading just the portions you want, and intelligently discarding them when you are done (more intelligently than you could, even in theory, since it can take advantage of knowing the entire state of the system, your program can't). In other words, as Skip was trying to tell you, mmap *already does* what you are saying might be better, and it does it better than you can, even in theory, from inside a process (as the OS will not reveal to you the data structures it has that you would need to match that performance). As you try to understand mmap, make sure your mental model can take into account the fact that it is easy and quite common to mmap a file several times larger than your physical memory, and it does not even *try* to read the whole thing in at any given time. You may benefit from reviewing/studying the difference between virtual memory and physical memory. -- http://mail.python.org/mailman/listinfo/python-list