On Fri, 27 Jul 2007 09:23:41 +0200 Mike Galbraith <[EMAIL PROTECTED]> wrote:
> On Fri, 2007-07-27 at 07:13 +0200, Mike Galbraith wrote: > > On Thu, 2007-07-26 at 11:05 -0700, Andrew Morton wrote: > > > > drops caches prior to both updatedb runs. > > > > > > I think that was the wrong thing to do. That will leave gobs of free > > > memory for updatedb to populate with dentries and inodes. > > > > > > Instead, fill all of memory up with pagecache, then do the updatedb. See > > > how much pagecache is left behind and see how large the vfs caches end up. > > I didn't _fill_ memory, but loaded it up a bit with some real workload > data... > > I tried time sh -c 'git diff v2.6.11 HEAD > /dev/null' to populate the > cache, and tried different values for vfs_cache_pressure. Nothing > prevented git's data from being trashed by updatedb. Turning the knob > downward rapidly became very unpleasant due to swap, (with 0 not > surprisingly being a true horror) but turning it up didn't help git one > bit. The amount of data that had to be re-read with stock 100 or 10000 > was the same, or at least so close that you couldn't see a difference in > vmstat and wall-clock. Cache sizes varied, but the bottom line didn't. > (wasn't surprised, seems quite reasonable that git's data looks old and > useless to the reclaim logic when updatedb runs in between git runs) > Did a bit of playing with this with 128MB of memory. - drop caches - read a 1MB file - run slocate.cron With vfs_cache_pressure=100: MemTotal: 116316 kB MemFree: 3196 kB Buffers: 54408 kB Cached: 5128 kB SwapCached: 0 kB Active: 41728 kB Inactive: 27540 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 116316 kB LowFree: 3196 kB SwapTotal: 1020116 kB SwapFree: 1019496 kB Dirty: 0 kB Writeback: 0 kB AnonPages: 9760 kB Mapped: 3808 kB Slab: 40468 kB SReclaimable: 34824 kB SUnreclaim: 5644 kB PageTables: 720 kB NFS_Unstable: 0 kB Bounce: 0 kB CommitLimit: 1078272 kB Committed_AS: 25988 kB VmallocTotal: 901112 kB VmallocUsed: 656 kB VmallocChunk: 900412 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 Hugepagesize: 4096 kB WIth vfs_cache_pressure=10000: MemTotal: 116316 kB MemFree: 3060 kB Buffers: 80792 kB Cached: 5052 kB SwapCached: 0 kB Active: 59432 kB Inactive: 36140 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 116316 kB LowFree: 3060 kB SwapTotal: 1020116 kB SwapFree: 1019512 kB Dirty: 0 kB Writeback: 0 kB AnonPages: 9756 kB Mapped: 3832 kB Slab: 14304 kB SReclaimable: 7992 kB SUnreclaim: 6312 kB PageTables: 732 kB NFS_Unstable: 0 kB Bounce: 0 kB CommitLimit: 1078272 kB Committed_AS: 26000 kB VmallocTotal: 901112 kB VmallocUsed: 656 kB VmallocChunk: 900412 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 Hugepagesize: 4096 kB so we reaped quite a lot more slab with the higher vfs_cache_pressure. What I think is killing us here is the blockdev pagecache: the pagecache which backs those directory entries and inodes. These pages get read multiple times because they hold multiple directory entries and multiple inodes. These multiple touches will put those pages onto the active list so they stick around for a long time and everything else gets evicted. I've never been very sure about this policy for the metadata pagecache. We read the filesystem objects into the dcache and icache and then we won't read from that page again for a long time (I expect). But the page will still hang around for a long time. It could be that we should leave those pages inactive. <tries it> diff -puN include/linux/buffer_head.h~a include/linux/buffer_head.h --- a/include/linux/buffer_head.h~a +++ a/include/linux/buffer_head.h @@ -130,7 +130,7 @@ BUFFER_FNS(Eopnotsupp, eopnotsupp) BUFFER_FNS(Unwritten, unwritten) #define bh_offset(bh) ((unsigned long)(bh)->b_data & ~PAGE_MASK) -#define touch_buffer(bh) mark_page_accessed(bh->b_page) +#define touch_buffer(bh) do { } while(0) /* If we *know* page->private refers to buffer_heads */ #define page_buffers(page) \ _ vfs_cache_pressure=100: vmm:/home/akpm# cat /proc/meminfo MemTotal: 116524 kB MemFree: 2692 kB Buffers: 51044 kB Cached: 5440 kB SwapCached: 0 kB Active: 19248 kB Inactive: 46996 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 116524 kB LowFree: 2692 kB SwapTotal: 1020116 kB SwapFree: 1019492 kB Dirty: 2008 kB Writeback: 0 kB AnonPages: 9772 kB Mapped: 3812 kB Slab: 44336 kB SReclaimable: 38792 kB SUnreclaim: 5544 kB PageTables: 720 kB NFS_Unstable: 0 kB Bounce: 0 kB CommitLimit: 1078376 kB Committed_AS: 26108 kB VmallocTotal: 901112 kB VmallocUsed: 648 kB VmallocChunk: 900412 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 Hugepagesize: 4096 kB vfs_cache_pressure=10000 vmm:/home/akpm# cat /proc/meminfo MemTotal: 116524 kB MemFree: 3720 kB Buffers: 79832 kB Cached: 6260 kB SwapCached: 0 kB Active: 18276 kB Inactive: 77584 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 116524 kB LowFree: 3720 kB SwapTotal: 1020116 kB SwapFree: 1019500 kB Dirty: 2228 kB Writeback: 0 kB AnonPages: 9788 kB Mapped: 3828 kB Slab: 13680 kB SReclaimable: 7676 kB SUnreclaim: 6004 kB PageTables: 736 kB NFS_Unstable: 0 kB Bounce: 0 kB CommitLimit: 1078376 kB Committed_AS: 26112 kB VmallocTotal: 901112 kB VmallocUsed: 648 kB VmallocChunk: 900412 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 Hugepagesize: 4096 kB So again, slab was trimmed a lot more, but all our pagecache still got evicted. ah, but I started that pagecache out on the inactive list. Try again. This time, instead of reading a 1GB file once, let's read an 80MB file four times. <no difference> OK, I saw what happened then. The inode for my 80MB file got reclaimed from icache and that instantly reclaimed all 80MB of pagecache. A single large file probably isn't a good testcase, but the same will happen with multiple files. Higher vfs_cache_pressure will worsen this effect. But it won't happen with mapped files because their inodes aren't reclaimable. More sophisticated testing is needed - there's something in ext3-tools which will mmap, page in and hold a file for you. Anyway, blockdev pagecache is a problem, I expect. It's worth playing with that patch. Another problem is atime updates. You really do want to mount noatime. Because with atimes enabled, each touch of a file will touch its inode and will keep its backing blockdev pagecache page in core. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/