We have many vma manipulation functions that are fast in the typical case,
but can optionally be instructed to populate an unbounded number of ptes
within the region they work on:
- mmap with MAP_POPULATE or MAP_LOCKED flags;
- remap_file_pages() with MAP_NONBLOCK not set or when working on a
  VM_LOCKED vma;
- mmap_region() and all its wrappers when mlock(MCL_FUTURE) is in effect;
- brk() when mlock(MCL_FUTURE) is in effect.

Current code handles these pte operations locally, while the sourrounding
code has to hold the mmap_sem write side since it's manipulating vmas.
This means we're doing an unbounded amount of pte population work with
mmap_sem held, and this causes problems as Andy Lutomirski reported
(we've hit this at Google as well, though it's not entirely clear why
people keep trying to use mlock(MCL_FUTURE) in the first place).

I propose introducing a new mm_populate() function to do this pte
population work after the mmap_sem has been released. mm_populate()
does need to acquire the mmap_sem read side, but critically, it
doesn't need to hold continuously for the entire duration of the
operation - it can drop it whenever things take too long (such as when
hitting disk for a file read) and re-acquire it later on.

The following patches are against v3.7:

- Patches 1-2 fix some issues I noticed while working on the existing code.
  If needed, they could potentially go in before the rest of the patches.

- Patch 3 introduces the new mm_populate() function and changes
  mmap_region() call sites to use it after they drop mmap_sem. This is
  inspired from Andy Lutomirski's proposal and is built as an extension
  of the work I had previously done for mlock() and mlockall() around
  v2.6.38-rc1. I had tried doing something similar at the time but had
  given up as there were so many do_mmap() call sites; the recent cleanups
  by Linus and Viro are a tremendous help here.

- Patches 4-6 convert some of the less-obvious places doing unbounded
  pte populates to the new mm_populate() mechanism.

- Patches 7-8 are code cleanups that are made possible by the
  mm_populate() work. In particular, they remove more code than the
  entire patch series added, which should be a good thing :)

- Patch 9 is optional to this entire series. It only helps to deal more
  nicely with racy userspace programs that might modify their mappings
  while we're trying to populate them. It adds a new VM_POPULATE flag
  on the mappings we do want to populate, so that if userspace replaces
  them with mappings it doesn't want populated, mm_populate() won't
  populate those replacement mappings.

Michel Lespinasse (9):
  mm: make mlockall preserve flags other than VM_LOCKED in def_flags
  mm: remap_file_pages() fixes
  mm: introduce mm_populate() for populating new vmas
  mm: use mm_populate() for blocking remap_file_pages()
  mm: use mm_populate() when adjusting brk with MCL_FUTURE in effect.
  mm: use mm_populate() for mremap() of VM_LOCKED vmas
  mm: remove flags argument to mmap_region
  mm: directly use __mlock_vma_pages_range() in find_extend_vma()
  mm: introduce VM_POPULATE flag to better deal with racy userspace programs

 arch/tile/mm/elf.c   |    1 -
 fs/aio.c             |    6 +++-
 include/linux/mm.h   |   23 +++++++++---
 include/linux/mman.h |    4 ++-
 ipc/shm.c            |   12 ++++---
 mm/fremap.c          |   51 ++++++++++++++-------------
 mm/internal.h        |    4 +-
 mm/memory.c          |   24 -------------
 mm/mlock.c           |   94 +++++++++++++------------------------------------
 mm/mmap.c            |   77 ++++++++++++++++++++++++----------------
 mm/mremap.c          |   25 +++++++------
 mm/nommu.c           |    5 ++-
 mm/util.c            |    6 +++-
 13 files changed, 154 insertions(+), 178 deletions(-)

-- 
1.7.7.3
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to