Re: problems with mmap() and disk caching

Andrey Zonov Mon, 09 Apr 2012 04:36:54 -0700

On Mon, Apr 9, 2012 at 1:18 PM, Konstantin Belousov <kostik...@gmail.com> wrote:
> On Mon, Apr 09, 2012 at 11:17:41AM +0400, Andrey Zonov wrote:
>> On 06.04.2012 12:13, Konstantin Belousov wrote:
>> >On Thu, Apr 05, 2012 at 11:54:53PM +0400, Andrey Zonov wrote:
[snip]
>> >>I always thought that active memory this is a sum of resident memory of
>> >>all processes, inactive shows disk cache and wired shows kernel itself.
>> >So you are wrong. Both active and inactive memory can be mapped and
>> >not mapped, both can belong to vnode or to anonymous objects etc.
>> >Active/inactive distinction is only the amount of references that was
>> >noted by pagedaemon, or some other page history like the way it was
>> >unwired.
>> >
>> >Wired is not neccessary means kernel-used pages, user processes can
>> >wire their pages as well.
>>
>> Let's talk about that in details.
>>
>> My understanding is the following:
>>
>> Active memory: the memory which is referenced by application.  An
> Assuming the part 'by application' is removed, this sentence is almost right.
> Any managed mapping of the page participates in the active references.
>
>> application may get memory only through mmap() (allocator don't use
>> brk()/sbrk() any more).  The resident memory of an application is the
>> sum of physical used memory.  So, sum of RSS is active memory.
> First, brk/sbrk is still used. Second, there is no requirement that
> resident pages are referenced. E.g. page could have participated in the
> buffer, and unwiring on the buffer dissolve put it into inactive state.
> Or pagedaemon cleared the reference and moved the page to inactive queue.
> Or the page was prefaulted by different optimizations.
>
> More, there is subtle difference between 'resident' and 'not causing fault
> on access'. Page may be resident, but pte was not preinstalled, or pte
> was flushed etc.


>From the user point of view: how can the memory be active if no-one (I
mean application) use it?

What I really saw not at once is that the program for a long time
worked with big mmap()'ed file, couldn't work well (many page faults)
with new version of the file, until I manually flushed active memory
by FS re-mounting.  New version couldn't force out the old one.  In my
opinion if VM moved cached objects to inactive queue after program
termination I wouldn't see this problem.

>>
>> Inactive memory: the memory which has no references.  Once we call
>> read() on the file, the file is in inactive memory, because we have no
>> references to this object, we just read it.  This is also released
>> memory by free().
> On buffers dissolve, buffer cache explicitely puts pages constituing
> the buffer, into the inactive queue. In fact, this is not quite right,
> e.g. if the same pages are mapped and actively referenced, then
> pagedaemon has slightly more work now to move the page from inactive
> to active.
>

Yes, sure, if someone else use the object it should be active and even
better to introduce new "SHARED" counter, like one is in MacOSX and
Linux.

> And, free(3) operates at so much higher level then vm subsystem that
> describing the interaction between these two is impossible in any
> definitive mood. Old naive mallocs put block description at the beggining
> of the block, actually causing free() to reference at least the first
> page of the block. Jemalloc often does madvise(MADV_FREE) for large
> freed allocations. MADV_FREE  moves pages between queues probabalistically.
>

That's exactly what I meant by free().  We drop act_count to 0 and
move page to inactive queue by vm_page_dontneed()

>>
>> Cache memory: I don't know what is it. It's always small enough to not
>> think about it.
> This was the bug you reported, and which Alan fixed on Sunday.
>

I've tested this patch under 9.0-STABLE and should say that it
introduces problems with interactivity on heavy disk loaded machines.
With the patch that I tested before I didn't observe such problems.

>>
>> Wired memory: kernel memory and yes, application may get wired memory
>> through mlock()/mlockall(), but I haven't seen any real application
>> which calls mlock().
> ntpd, amd from the base system. gpg and similar programs try to mlock
> key store to avoid sensitive material leakage to the swap. cdrecord(8)
> tried to mlock itself to avoid indefinite stalls during write.
>

Nice catch ;-)

>
>>
>> >>
>> >>>>
>> >>>>Read the file:
>> >>>>$ cat /mnt/random>   /dev/null
>> >>>>
>> >>>>Mem: 79M Active, 55M Inact, 1746M Wired, 1240M Buf, 45G Free
>> >>>>
>> >>>>Now the file is in wired memory.  I do not understand why so.
>> >>>You do use UFS, right ?
>> >>
>> >>Yes.
>> >>
>> >>>There is enough buffer headers and buffer KVA
>> >>>to have buffers allocated for the whole file content. Since buffers wire
>> >>>corresponding pages, you get pages migrated to wired.
>> >>>
>> >>>When there appears a buffer pressure (i.e., any other i/o started),
>> >>>the buffers will be repurposed and pages moved to inactive.
>> >>>
>> >>
>> >>OK, how can I get amount of disk cache?
>> >You cannot. At least I am not aware of any counter that keeps track
>> >of the resident pages belonging to vnode pager.
>> >
>> >Buffers should not be thought as disk cache, pages cache disk content.
>> >Instead, VMIO buffers only provide bread()/bwrite() compatible interface
>> >to the page cache (*) for filesystems.
>> >(*) - The cache term is used in generic term, not to confuse with
>> >cached pages counter from top etc.
>> >
>>
>> Yes, I know that.  I try once again to ask my question about buffers.
>> Is this reasonable to use for them 10% of the physical memory or we may
>> set rational upper limit automatically?
>>

This question is still without answer :)

>> >>
>> >>>>
>> >>>>Could you please give me explanation about active/inactive/wired memory?
>> >>>>
>> >>>>
>> >>>>>because I suspect that the current code does more harm than good. In
>> >>>>>theory, it saves activations of the page daemon. However, more often
>> >>>>>than not, I suspect that we are spending more on page reactivations
>> >>>>>than
>> >>>>>we are saving on page daemon activations. The sequential access
>> >>>>>detection heuristic is just too easily triggered. For example, I've
>> >>>>>seen
>> >>>>>it triggered by demand paging of the gcc text segment. Also, I think
>> >>>>>that pmap_remove_all() and especially vm_page_cache() are too severe
>> >>>>>for
>> >>>>>a detection heuristic that is so easily triggered.
>> >>>>>
>> >>>>[snip]
>> >>>>
>> >>>>--
>> >>>>Andrey Zonov
>> >>
>> >>--
>> >>Andrey Zonov
>>
>> --
>> Andrey Zonov



-- 
Andrey Zonov
_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"

Re: problems with mmap() and disk caching

Reply via email to