2012/4/2 Samuel Thibault <samuel.thiba...@gnu.org>: > Sergio Lopez, le Mon 02 Apr 2012 17:09:08 +0200, a écrit : >> El Mon, 02 Apr 2012 00:23:03 +0300 >> Maksym Planeta <mcsim.plan...@gmail.com> escribió: >> > This function allows user advise the kernel about how to handle paging >> > input/output in specified memory range. There are several behaviors, >> > like RANDOM, NORMAL, SEQUENTIAL, WILLNEED and DONTNEED. From the page >> > fault handler's point of view these behaviors differ only in size of >> > memory chunk that will be read ahead. >> >> I don't think the kernel should be the one to be advised, but the >> filesystem translators. These are the ones who really know current and >> future (as they control most of the operations) state of the object, > > Do they really? We discussed about it with neal a long time ago, and we > believed that Mach was at a better position, because it knows about > each and every mapping. Say for instance that two processes map the same > file, and access it concurrently, but in a different way. AIUI, the > translators will get data requests without indication of what mapping is > pulling it, and thus no correlation between them, thus seemingly > random.
In some cases, Mach can do a better job at finding a pattern in the operations over a mmap'ed object. But this can be only used as a hint, as only the pager has the information to know if a certain page request would be fulfilled (Mach only knows the object size, which is provided by the translator via m_o_change_attributes and could even be outdated). Sometimes the translator is able to detect a pattern that Mach can't. This is the case when receiving large write requests over a new or truncated file. And with clustered pageins, it could be desirable to increase both object and cluster size to be able to coalesce pager faults and m_o_data_unlock requests. Ideally, both pager and Mach should know about the hint given by the user via madvise and cooperate to pick the better access policy for an object. But I think an initial implementation with support for a variable cluster size is enough as a start, while fine grained policies and heuristics can be implemented later after careful analysis of their non evident implications.