Hello, sorry for resurrecting this old thread, but I need to test my understanding of the problem and I'd like to ask for a clarification.
On Thu, Aug 4, 2011 at 7:09 PM, richard -rw- weinberger <richard.weinber...@gmail.com> wrote: > On Thu, Aug 4, 2011 at 5:42 PM, Riccardo Murri <riccardo.mu...@gmail.com> wrote: >> >> I see that each UML instance starts a variable number of threads/processes. >> >> I'm using UML in a batch system (Sun Grid Engine 6.2); SGE kills my >> jobs because they exceed the allowed memory reservation. My guess is >> that SGE miscomputes the memory usage by computing the total over all >> threads/processes without accounting for shared pages. >> [...] > > UML starts on the host side per process one helper thread. > (In SKAS0 mode, which is the default.) > So, you can limit the number of host threads by starting less > processes within UML. ;) > > Most likely SGE does not detect them as threads because UML uses > clone() to create them... Actually we've seen the same behavior also in TORQUE, so this is becoming a major issue for us. The question is this: I see in the libc sources that clone() is used to create threads as well. So I guess the difference is in the flags that are passed to clone() in the two cases? Now, libc create_thread() uses (lines 182--188 of file file "nptl/sysdeps/pthread/createthread.c"): int clone_flags = (CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGNAL | CLONE_SETTLS | CLONE_PARENT_SETTID | CLONE_CHILD_CLEARTID | CLONE_SYSVSEM #if __ASSUME_NO_CLONE_DETACHED == 0 | CLONE_DETACHED #endif | 0); whereas, if I'm not mistaken, UML uses (file "kernel/skas/clone.c"): err = stub_syscall2(__NR_clone, CLONE_PARENT | CLONE_FILES | SIGCHLD, STUB_DATA + UM_KERN_PAGE_SIZE / 2 - sizeof(void *)); But then this means that the additional processes created by UML do not share the memory space (no CLONE_VM), correct? Thus: - batch system schedulers do righteously consider each UML "thread" as a separate process; - however, UML "threads" do share a large portion of the memory, as can be seen from this "ps" output: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 6467 admin 15 0 32.0g 13g 13g S 0.0 27.7 0:00.00 kernel64-3.0.4 6466 admin 16 0 32.0g 13g 13g S 0.0 27.7 0:00.15 kernel64-3.0.4 6465 admin 22 0 32.0g 13g 13g S 0.0 27.7 0:00.00 kernel64-3.0.4 6458 admin 15 0 32.0g 13g 13g S 39.2 27.7 37:00.04 kernel64-3.0.4 7437 admin 15 0 12.0g 12g 12g T 52.9 25.6 70:54.39 kernel64-3.0.4 - so the problem lies in the algorithm that SGE and TORQUE apply for computing the amount of memory used, which apparently just sums up the total VSZ for each process (fast), instead of counting the number of pages while ensuring that each shared page is counted only once (slow)? Thanks for any clarification! Riccardo ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct _______________________________________________ User-mode-linux-user mailing list User-mode-linux-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-user