On Sat, Jun 12, 2010 at 12:06 PM, Kris Maglione <maglion...@gmail.com> wrote: > On Sat, Jun 12, 2010 at 12:53:27PM +0200, pancake wrote: >> >> On Jun 12, 2010, at 9:27 AM, Connor Lane Smith <c...@lubutu.com> wrote: >>> >>> On 12 June 2010 08:00, Kris Maglione <maglion...@gmail.com> wrote: >>> Except it can actually fetch as much data as is addressable in memory >>> in a single call, if the kernel and library are tailored to. >> >> That's why mmap is for. Using read is just stupid. > > mmap is silly. If you want that much data mapped, it's because you want fast > access to it. If you just want random access to it, you read it as you need > it. mmap doesn't offer any performance advantage. When you touch a page that > wasn't already there, the kernel has to fault it in, which is already as > expensive as the read system call, and even more so because of the coarse > granularity. It needs to read in an entire page, even if all you need is a > byte. And if you need a dword across a page boundary, you get two faults and > two pages read in. There's really just no point.
I just know I'm going to regret getting involved in this but... My understanding is that on Linux at least, reading causes the data to be moved into the kernel's page cache (which I believe has a page level granularity even if you "read only a byte"), and then a copy is made from the page cache into the processes memory space. Mmapping it means your process gets the page cache page mapped into its address space, so the data is only in memory once rather than an average of 1.x times where x depends on pagecache discard policy. So IF you are genuinely moving unpredictably around accessing a truly huge file, mmapping it means that you can fit more of it in memory rather than having both your program and the page cache trying to figure out which bits to discard in an attempt to keep memory usage down. This effect is actually much more important with huge files than smaller files where the page cache duplication doesn't have as much effect on system memory usage as a whole. -- cheers, dave tweed__________________________ computer vision reasearcher: david.tw...@gmail.com "while having code so boring anyone can maintain it, use Python." -- attempted insult seen on slashdot