Hi Ralph,

There is an MCA param that tells the orted to set its usage limits to the hard 
limit:

                 MCA opal: parameter "opal_set_max_sys_limits" (current 
value:<0>, data source: default value)
                           Set to non-zero to automatically set any 
system-imposed limits to the maximum allowed

The orted could be used to set the soft limit down from that value on a per-job 
basis, but we didn't provide a mechanism for specifying it. Would be relatively 
easy to do, though.

What version are you using? If I create a patch, would you be willing to test 
it?

1.4.2, with 1.4.1 available, and 1.4.3 waiting in the wings.
I would love to test any patch you could come up with.
The ability to set any valid limit to any valid value,
applied equally to all processes, would go a long way in
making our environment more stable.  Thanks!

Hi,

We would like to set process memory limits (vmemoryuse, in csh
terms) on remote processes.  Our batch system is torque/moab.

The nodes of our cluster each have 24GB of physical memory, of
which 4GB is taken up by the kernel and the root file system.
Note that these are diskless nodes, so no swap either.

We can globally set the per-process limit to 2.5GB.  This works
fine if applications run "packed":  8 MPI tasks running on each
8-core node, for an aggregate limit of 20GB.  However, if a job
only wants to run 4 tasks, the soft limit can safely be raised
to 5GB.  2 tasks, 10GB.  1 task, the full 20GB.

Upping the soft limit in the batch script itself only affects
the "head node" of the job.  Since limits are not part of the
"environment", I can find no way propagate them to remote nodes.

If I understand how this all works, the remote processes are
started by orted, and therefore inherit its limits.  Is there
any sort of orted configuration that can help here?  Any other
thoughts about how to approach this?

Thanks!

--
Best regards,

David Turner
User Services Group        email: dptur...@lbl.gov
NERSC Division             phone: (510) 486-4027
Lawrence Berkeley Lab        fax: (510) 486-4316
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Best regards,

David Turner
User Services Group        email: dptur...@lbl.gov
NERSC Division             phone: (510) 486-4027
Lawrence Berkeley Lab        fax: (510) 486-4316

Reply via email to