Hello,

sorry for resurrecting this old thread, but I need to test my
understanding of the problem and I'd like to ask for a clarification.

On Thu, Aug 4, 2011 at 7:09 PM, richard -rw- weinberger
<richard.weinber...@gmail.com> wrote:
> On Thu, Aug 4, 2011 at 5:42 PM, Riccardo Murri
<riccardo.mu...@gmail.com> wrote:
>>
>> I see that each UML instance starts a variable number of threads/processes.
>>
>> I'm using UML in a batch system (Sun Grid Engine 6.2); SGE kills my
>> jobs because they exceed the allowed memory reservation.  My guess is
>> that SGE miscomputes the memory usage by computing the total over all
>> threads/processes without accounting for shared pages.
>> [...]
>
> UML starts on the host side per process one helper thread.
> (In SKAS0 mode, which is the default.)
> So, you can limit the number of host threads by starting less
> processes within UML. ;)
>
> Most likely SGE does not detect them as threads because UML uses
> clone() to create them...

Actually we've seen the same behavior also in TORQUE, so this is
becoming a major issue for us.

The question is this: I see in the libc sources that clone() is used
to create threads as well.  So I guess the difference is in the flags
that are passed to clone() in the two cases?

Now, libc  create_thread() uses (lines 182--188 of file file
"nptl/sysdeps/pthread/createthread.c"):

      int clone_flags = (CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGNAL
                         | CLONE_SETTLS | CLONE_PARENT_SETTID
                         | CLONE_CHILD_CLEARTID | CLONE_SYSVSEM
    #if __ASSUME_NO_CLONE_DETACHED == 0
                         | CLONE_DETACHED
    #endif
                         | 0);

whereas, if I'm not mistaken, UML uses (file "kernel/skas/clone.c"):

    err = stub_syscall2(__NR_clone, CLONE_PARENT | CLONE_FILES | SIGCHLD,
                    STUB_DATA + UM_KERN_PAGE_SIZE / 2 - sizeof(void *));

But then this means that the additional processes created by UML do
not share the memory space (no CLONE_VM), correct?

Thus:

- batch system schedulers do righteously consider each UML "thread" as
  a separate process;

- however, UML "threads" do share a large portion of the memory, as
  can be seen from this "ps" output:

      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
      6467 admin     15   0 32.0g  13g  13g S  0.0 27.7   0:00.00 kernel64-3.0.4
      6466 admin     16   0 32.0g  13g  13g S  0.0 27.7   0:00.15 kernel64-3.0.4
      6465 admin     22   0 32.0g  13g  13g S  0.0 27.7   0:00.00 kernel64-3.0.4
      6458 admin     15   0 32.0g  13g  13g S 39.2 27.7  37:00.04 kernel64-3.0.4
      7437 admin     15   0 12.0g  12g  12g T 52.9 25.6  70:54.39 kernel64-3.0.4

- so the problem lies in the algorithm that SGE and TORQUE apply for
  computing the amount of memory used, which apparently just sums up
  the total VSZ for each process (fast), instead of counting the
  number of pages while ensuring that each shared page is counted only
  once (slow)?

Thanks for any clarification!

Riccardo

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
User-mode-linux-user mailing list
User-mode-linux-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-user

Reply via email to