On Sat, Jun 12, 2010 at 03:18:57PM +0100, David Tweed wrote:
I just know I'm going to regret getting involved in this but...

Probably not. You seem reasonable. I only flame trolls.

My understanding is that on Linux at least, reading causes the data to
be moved into the kernel's page cache (which I believe has a page
level granularity even if you "read only a byte"), and then a copy is
made from the page cache into the processes memory space. Mmapping it
means your process gets the page cache page mapped into its address
space, so the data is only in memory once rather than an average of
1.x times where x depends on pagecache discard policy. So IF you are
genuinely moving unpredictably around accessing a truly huge file,
mmapping it means that you can fit more of it in memory rather than
having both your program and the page cache trying to figure out which
bits to discard in an attempt to keep memory usage down. This effect
is actually much more important with huge files than smaller files
where the page cache duplication doesn't have as much effect on system
memory usage as a whole.

You may be right. I don't know very much about Linux's buffer cache. On the other hand, even so, I'd consider read the better option in most use cases I can think of. There are probably cases where mmap would be more efficient, but I rather expect that the gains in efficiency depend on the programmer knowing to a fairly high degree of detail when and why. It doesn't mean that mmap should be used instead of read wherever possible.

--
Kris Maglione

It is a farce to call any being virtuous whose virtues do not result
from the exercise of its own reason.
        --Mary Wollstonecraft


Reply via email to