On 08/25/2015 03:41 PM, Michal Hocko wrote: > On Fri 21-08-15 14:31:32, Eric B Munson wrote: > [...] >> I am in the middle of implementing lock on fault this way, but I cannot >> see how we will hanlde mremap of a lock on fault region. Say we have >> the following: >> >> addr = mmap(len, MAP_ANONYMOUS, ...); >> mlock(addr, len, MLOCK_ONFAULT); >> ... >> mremap(addr, len, 2 * len, ...) >> >> There is no way for mremap to know that the area being remapped was lock >> on fault so it will be locked and prefaulted by remap. How can we avoid >> this without tracking per vma if it was locked with lock or lock on >> fault? > > Yes mremap is a problem and it is very much similar to mmap(MAP_LOCKED). > It doesn't guarantee the full mlock semantic because it leaves partially > populated ranges behind without reporting any error.
Hm, that's right. > Considering the current behavior I do not thing it would be terrible > thing to do what Konstantin was suggesting and populate only the full > ranges in a best effort mode (it is done so anyway) and document the > behavior properly. > " > If the memory segment specified by old_address and old_size is > locked (using mlock(2) or similar), then this lock is maintained > when the segment is resized and/or relocated. As a consequence, > the amount of memory locked by the process may change. > > If the range is already fully populated and the range is > enlarged the new range is attempted to be fully populated > as well to preserve the full mlock semantic but there is no > guarantee this will succeed. Partially populated (e.g. created by > mlock(MLOCK_ONFAULT)) ranges do not have the full mlock semantic > so they are not populated on resize. > " > > So what we have as a result is that partially populated ranges are > preserved and fully populated ones work in the best effort mode the same > way as they are now. > > Does that sound at least remotely reasonably? I'll basically repeat what I said earlier: - mremap scanning existing pte's to figure out the population would slow it down for no good reason - it would be unreliable anyway: - example: was the area completely populated because MLOCK_ONFAULT was not used or because the process faulted it already - example: was the area not completely populated because MLOCK_ONFAULT was used, or because mmap(MAP_LOCKED) failed to populate it fully? I think the first point is a pointless regression for workloads that use just plain mlock() and don't want the onfault semantics. Unless there's some shortcut? Does vma have a counter of how much is populated? (I don't think so?)