On Fri, Jan 21, 2011 at 2:58 PM, Alan Cox <alan.l....@gmail.com> wrote:
> On Fri, Jan 21, 2011 at 11:44 AM, John Baldwin <j...@freebsd.org> wrote: > >> On Friday, January 21, 2011 11:09:10 am Sergey Kandaurov wrote: >> > Hello. >> > >> > Some time ago I faced with a problem booting with 400GB physmem. >> > The problem is that vm.max_proc_mmap type overflows with >> > such high value, and that results in a broken mmap() syscall. >> > The max_proc_mmap value is a signed int and roughly calculated >> > at vmmapentry_rsrc_init() as u_long vm_kmem_size quotient: >> > vm_kmem_size / sizeof(struct vm_map_entry) / 100. >> > >> > Although at the time it was introduced at svn r57263 the value >> > was quite low (f.e. the related commit log stands: >> > "The value defaults to around 9000 for a 128MB machine."), >> > the problem is observed on amd64 where KVA space after >> > r212784 is factually bound to the only physical memory size. >> > >> > With INT_MAX here is 0x7fffffff, and sizeof(struct vm_map_entry) >> > is 120, it's enough to have sligthly less than 256GB to be able >> > to reproduce the problem. >> > >> > I rewrote vmmapentry_rsrc_init() to set large enough limit for >> > max_proc_mmap just to protect from integer type overflow. >> > As it's also possible to live tune this value, I also added a >> > simple anti-shoot constraint to its sysctl handler. >> > I'm not sure though if it's worth to commit the second part. >> > >> > As this patch may cause some bikeshedding, >> > I'd like to hear your comments before I will commit it. >> > >> > http://plukky.net/~pluknet/patches/max_proc_mmap.diff<http://plukky.net/%7Epluknet/patches/max_proc_mmap.diff> >> >> Is there any reason we can't just make this variable and sysctl a long? >> >> > Or just delete it. > > 1. Contrary to what the commit message says, this sysctl does not > effectively limit the number of vm map entries. It only limits the number > that are created by one system call, mmap(). Other system calls create vm > map entries just as easily, for example, mprotect(), madvise(), mlock(), and > minherit(). Basically, anything that alters the properties of a mapping. > Thus, in 2000, after this sysctl was added, the same resource exhaustion > induced crash could have been reproduced by trivially changing the program > in PR/16573 to do an mprotect() or two. > > In a nutshell, if you want to really limit the number of vm map entries > that a process can allocate, the implementation is a bit more involved than > what was done for this sysctl. > > 2. UMA implements M_WAITOK, whereas the old zone allocator in 2000 did > not. Moreover, vm map entries for user maps are allocated with M_WAITOK. > So, the exact crash reported in PR/16573 couldn't happen any longer. > > Actually, I take back part of what I said here. The old zone allocator did implement something like M_WAITOK, and that appears to have been used for user maps. However, the crash described in PR/16573 was actually on the allocation of a vm map entry within the *kernel* address space for a process U area. This type of allocation did not use the old zone allocator's equivalent to M_WAITOK. However, we no longer have U areas, so the exact crash scenario is clearly no longer possible. Interestingly, the sysctl in question has no direct effect on the allocation of kernel vm map entries. So, I remain skeptical that this sysctl is preventing any resource exhaustion based panics in the current kernel. Again, I would be thrilled to see one or more people do some testing, such as rerunning the program from PR/16573. 3. We now have the "vmemoryuse" resource limit. When this sysctl was > defined, we didn't. Limiting the virtual memory indirectly but effectively > limits the number of vm map entries that a process can allocate. > > In summary, I would do a little due diligence, for example, run the program > from PR/16573 with the limit disabled. If you can't reproduce the crash, in > other words, nothing contradicts point #2 above, then I would just delete > this sysctl. > > Alan > > _______________________________________________ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"