If we map the region with MAP_NORESERVE and MAP_SHARED, we can skip to check reserve counting and eventually we cannot be ensured to allocate a huge page in fault time. With following example code, you can easily find this situation.
Assume 2MB, nr_hugepages = 100 fd = hugetlbfs_unlinked_fd(); if (fd < 0) return 1; size = 200 * MB; flag = MAP_SHARED; p = mmap(NULL, size, PROT_READ|PROT_WRITE, flag, fd, 0); if (p == MAP_FAILED) { fprintf(stderr, "mmap() failed: %s\n", strerror(errno)); return -1; } size = 2 * MB; flag = MAP_ANONYMOUS | MAP_SHARED | MAP_HUGETLB | MAP_NORESERVE; p = mmap(NULL, size, PROT_READ|PROT_WRITE, flag, -1, 0); if (p == MAP_FAILED) { fprintf(stderr, "mmap() failed: %s\n", strerror(errno)); } p[0] = '0'; sleep(10); During executing sleep(10), run 'cat /proc/meminfo' on another process. You'll find a mentioned problem. non reserved shared mapping should not eat into reserve space. So return error when we don't find enough free space. Reviewed-by: Wanpeng Li <liw...@linux.vnet.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.ku...@linux.vnet.ibm.com> Signed-off-by: Joonsoo Kim <iamjoonsoo....@lge.com> diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 8a61638..87e73bd 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -464,6 +464,8 @@ void reset_vma_resv_huge_pages(struct vm_area_struct *vma) /* Returns true if the VMA has associated reserve pages */ static int vma_has_reserves(struct vm_area_struct *vma) { + if (vma->vm_flags & VM_NORESERVE) + return 0; if (vma->vm_flags & VM_MAYSHARE) return 1; if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/