On Sat, Jun 12, 2010 at 03:18:57PM +0100, David Tweed wrote:
I just know I'm going to regret getting involved in this but...
Probably not. You seem reasonable. I only flame trolls.
My understanding is that on Linux at least, reading causes the data to be moved into the kernel's page cache (which I believe has a page level granularity even if you "read only a byte"), and then a copy is made from the page cache into the processes memory space. Mmapping it means your process gets the page cache page mapped into its address space, so the data is only in memory once rather than an average of 1.x times where x depends on pagecache discard policy. So IF you are genuinely moving unpredictably around accessing a truly huge file, mmapping it means that you can fit more of it in memory rather than having both your program and the page cache trying to figure out which bits to discard in an attempt to keep memory usage down. This effect is actually much more important with huge files than smaller files where the page cache duplication doesn't have as much effect on system memory usage as a whole.
You may be right. I don't know very much about Linux's buffer cache. On the other hand, even so, I'd consider read the better option in most use cases I can think of. There are probably cases where mmap would be more efficient, but I rather expect that the gains in efficiency depend on the programmer knowing to a fairly high degree of detail when and why. It doesn't mean that mmap should be used instead of read wherever possible.
-- Kris Maglione It is a farce to call any being virtuous whose virtues do not result from the exercise of its own reason. --Mary Wollstonecraft