Re: problems with mmap() and disk caching

Andrey Zonov Tue, 22 May 2012 23:02:52 -0700

On 4/30/12 3:49 AM, Alan Cox wrote:

On 04/11/2012 01:07, Andrey Zonov wrote:

On 10.04.2012 20:19, Alan Cox wrote:

On 04/09/2012 10:26, John Baldwin wrote:

On Thursday, April 05, 2012 11:54:31 am Alan Cox wrote:

On 04/04/2012 02:17, Konstantin Belousov wrote:

On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote:

Hi,


I open the file, then call mmap() on the whole file and get pointer,
then I work with this pointer. I expect that page should be only
once
touched to get it into the memory (disk cache?), but this doesn't
work!

I wrote the test (attached) and ran it for the 1G file generated
from
/dev/random, the result is the following:

Prepare file:
# swapoff -a
# newfs /dev/ada0b
# mount /dev/ada0b /mnt
# dd if=/dev/random of=/mnt/random-1024 bs=1m count=1024

Purge cache:
# umount /mnt
# mount /dev/ada0b /mnt

Run test:
$ ./mmap /mnt/random-1024 30
mmap: 1 pass took: 7.431046 (none: 262112; res: 32; super:
0; other: 0)
mmap: 2 pass took: 7.356670 (none: 261648; res: 496; super:
0; other: 0)
mmap: 3 pass took: 7.307094 (none: 260521; res: 1623; super:
0; other: 0)
mmap: 4 pass took: 7.350239 (none: 258904; res: 3240; super:
0; other: 0)
mmap: 5 pass took: 7.392480 (none: 257286; res: 4858; super:
0; other: 0)
mmap: 6 pass took: 7.292069 (none: 255584; res: 6560; super:
0; other: 0)
mmap: 7 pass took: 7.048980 (none: 251142; res: 11002; super:
0; other: 0)
mmap: 8 pass took: 6.899387 (none: 247584; res: 14560; super:
0; other: 0)
mmap: 9 pass took: 7.190579 (none: 242992; res: 19152; super:
0; other: 0)
mmap: 10 pass took: 6.915482 (none: 239308; res: 22836; super:
0; other: 0)
mmap: 11 pass took: 6.565909 (none: 232835; res: 29309; super:
0; other: 0)
mmap: 12 pass took: 6.423945 (none: 226160; res: 35984; super:
0; other: 0)
mmap: 13 pass took: 6.315385 (none: 208555; res: 53589; super:
0; other: 0)
mmap: 14 pass took: 6.760780 (none: 192805; res: 69339; super:
0; other: 0)
mmap: 15 pass took: 5.721513 (none: 174497; res: 87647; super:
0; other: 0)
mmap: 16 pass took: 5.004424 (none: 155938; res: 106206; super:
0; other: 0)
mmap: 17 pass took: 4.224926 (none: 135639; res: 126505; super:
0; other: 0)
mmap: 18 pass took: 3.749608 (none: 117952; res: 144192; super:
0; other: 0)
mmap: 19 pass took: 3.398084 (none: 99066; res: 163078; super:
0; other: 0)
mmap: 20 pass took: 3.029557 (none: 74994; res: 187150; super:
0; other: 0)
mmap: 21 pass took: 2.379430 (none: 55231; res: 206913; super:
0; other: 0)
mmap: 22 pass took: 2.046521 (none: 40786; res: 221358; super:
0; other: 0)
mmap: 23 pass took: 1.152797 (none: 30311; res: 231833; super:
0; other: 0)
mmap: 24 pass took: 0.972617 (none: 16196; res: 245948; super:
0; other: 0)
mmap: 25 pass took: 0.577515 (none: 8286; res: 253858; super:
0; other: 0)
mmap: 26 pass took: 0.380738 (none: 3712; res: 258432; super:
0; other: 0)
mmap: 27 pass took: 0.253583 (none: 1193; res: 260951; super:
0; other: 0)
mmap: 28 pass took: 0.157508 (none: 0; res: 262144; super:
0; other: 0)
mmap: 29 pass took: 0.156169 (none: 0; res: 262144; super:
0; other: 0)
mmap: 30 pass took: 0.156550 (none: 0; res: 262144; super:
0; other: 0)

If I ran this:
$ cat /mnt/random-1024> /dev/null
before test, when result is the following:

$ ./mmap /mnt/random-1024 5
mmap: 1 pass took: 0.337657 (none: 0; res: 262144; super:
0; other: 0)
mmap: 2 pass took: 0.186137 (none: 0; res: 262144; super:
0; other: 0)
mmap: 3 pass took: 0.186132 (none: 0; res: 262144; super:
0; other: 0)
mmap: 4 pass took: 0.186535 (none: 0; res: 262144; super:
0; other: 0)
mmap: 5 pass took: 0.190353 (none: 0; res: 262144; super:
0; other: 0)

This is what I expect. But why this doesn't work without reading
file
manually?

Issue seems to be in some change of the behaviour of the reserv or
phys allocator. I Cc:ed Alan.

I'm pretty sure that the behavior here hasn't significantly changed in
about twelve years. Otherwise, I agree with your analysis.

On more than one occasion, I've been tempted to change:

pmap_remove_all(mt);
if (mt->dirty != 0)
vm_page_deactivate(mt);
else
vm_page_cache(mt);

to:

vm_page_dontneed(mt);

because I suspect that the current code does more harm than good. In
theory, it saves activations of the page daemon. However, more often
than not, I suspect that we are spending more on page reactivations
than
we are saving on page daemon activations. The sequential access
detection heuristic is just too easily triggered. For example, I've
seen it triggered by demand paging of the gcc text segment. Also, I
think that pmap_remove_all() and especially vm_page_cache() are too
severe for a detection heuristic that is so easily triggered.

Are you planning to commit this?


Not yet. I did some tests with a file that was several times larger than
DRAM, and I didn't like what I saw. Initially, everything behaved as
expected, but about halfway through the test the bulk of the pages were
active. Despite the call to pmap_clear_reference() in
vm_page_dontneed(), the page daemon is finding the pages to be
referenced and reactivating them. The net result is that the time it
takes to read the file (from a relatively fast SSD) goes up by about
12%. So, this still needs work.


Hi Alan,

What do you think about attached patch?


Sorry for the slow reply, I've been rather busy for the past couple of
weeks. What you propose is clearly good for sequential accesses, but not
so good for random accesses. Keep in mind, the potential costs of
unconditionally increasing the read window include not only wasted I/O
but also increased memory pressure. Rather than argue about which is
more important, sequential or random access, I think it's more
productive to replace the sequential access heuristic. The current
heuristic is just not that sophisticated. It's easy to do better.

The attached patch implements a new heuristic, which starts with the
same initial read window as the current heuristic, but arithmetically
grows the window on sequential page faults. From a stylistic standpoint,
this patch also cleanly separates the "read ahead" logic from the "cache
behind" logic.

At the same time, this new heuristic is more selective about performing
cache behind. It requires three or four sequential page faults before
cache behind is enabled. More precisely, it requires the read ahead
window to reach its maximum size before cache behind is enabled.

For long, sequential accesses, the results of my performance tests are
just good as unconditionally increasing the window size. I'm also seeing
fewer pages needlessly cached by the cache behind heuristic. That said,
there is still room for improvement. We are still not achieving the same
sequential performance as "dd", and there are still more pages being
cached than I would like.

Alan

I've widely tested your patch and it showed good enough results. I'vecommited it in our tree and it will be soon on production cluster.


Thanks a lot for help and your improvements!

--
Andrey Zonov
_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"

Re: problems with mmap() and disk caching

Reply via email to