Re: [OMPI users] memory limits on remote nodes

2010-10-07 Thread David Turner

Hi Ralph,


There is an MCA param that tells the orted to set its usage limits to the hard 
limit:

 MCA opal: parameter "opal_set_max_sys_limits" (current 
value:<0>, data source: default value)
   Set to non-zero to automatically set any 
system-imposed limits to the maximum allowed

The orted could be used to set the soft limit down from that value on a per-job 
basis, but we didn't provide a mechanism for specifying it. Would be relatively 
easy to do, though.

What version are you using? If I create a patch, would you be willing to test 
it?


1.4.2, with 1.4.1 available, and 1.4.3 waiting in the wings.
I would love to test any patch you could come up with.
The ability to set any valid limit to any valid value,
applied equally to all processes, would go a long way in
making our environment more stable.  Thanks!


Hi,

We would like to set process memory limits (vmemoryuse, in csh
terms) on remote processes.  Our batch system is torque/moab.

The nodes of our cluster each have 24GB of physical memory, of
which 4GB is taken up by the kernel and the root file system.
Note that these are diskless nodes, so no swap either.

We can globally set the per-process limit to 2.5GB.  This works
fine if applications run "packed":  8 MPI tasks running on each
8-core node, for an aggregate limit of 20GB.  However, if a job
only wants to run 4 tasks, the soft limit can safely be raised
to 5GB.  2 tasks, 10GB.  1 task, the full 20GB.

Upping the soft limit in the batch script itself only affects
the "head node" of the job.  Since limits are not part of the
"environment", I can find no way propagate them to remote nodes.

If I understand how this all works, the remote processes are
started by orted, and therefore inherit its limits.  Is there
any sort of orted configuration that can help here?  Any other
thoughts about how to approach this?

Thanks!

--
Best regards,

David Turner
User Services Groupemail: dptur...@lbl.gov
NERSC Division phone: (510) 486-4027
Lawrence Berkeley Labfax: (510) 486-4316
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Best regards,

David Turner
User Services Groupemail: dptur...@lbl.gov
NERSC Division phone: (510) 486-4027
Lawrence Berkeley Labfax: (510) 486-4316


Re: [OMPI users] memory limits on remote nodes

2010-10-07 Thread Reuti
Am 07.10.2010 um 01:55 schrieb David Turner:

> Hi,
> 
> We would like to set process memory limits (vmemoryuse, in csh
> terms) on remote processes.  Our batch system is torque/moab.

Isn't it possible to set this up in torque/moab directly? In SGE I would simply 
define h_vmem and it's per slot then; and with a tight integration all Open MPI 
processes will be children of sge_execd and the limit will be enforced.

-- Reuti


> The nodes of our cluster each have 24GB of physical memory, of
> which 4GB is taken up by the kernel and the root file system.
> Note that these are diskless nodes, so no swap either.
> 
> We can globally set the per-process limit to 2.5GB.  This works
> fine if applications run "packed":  8 MPI tasks running on each
> 8-core node, for an aggregate limit of 20GB.  However, if a job
> only wants to run 4 tasks, the soft limit can safely be raised
> to 5GB.  2 tasks, 10GB.  1 task, the full 20GB.
> 
> Upping the soft limit in the batch script itself only affects
> the "head node" of the job.  Since limits are not part of the
> "environment", I can find no way propagate them to remote nodes.
> 
> If I understand how this all works, the remote processes are
> started by orted, and therefore inherit its limits.  Is there
> any sort of orted configuration that can help here?  Any other
> thoughts about how to approach this?
> 
> Thanks!
> 
> -- 
> Best regards,
> 
> David Turner
> User Services Groupemail: dptur...@lbl.gov
> NERSC Division phone: (510) 486-4027
> Lawrence Berkeley Labfax: (510) 486-4316
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] Pros and cons of --enable-heterogeneous

2010-10-07 Thread David Ronis
I have various boxes that run openmpi and I can't seem to use all of
them at once because they have different CPU's (e.g., pentiums, athlons
(both 32 bit) vs Intel I7 (64 bit)).   I'm about the build 1.4.3 and was
wondering if I should add --enable-heterogenous to the configure flags.
Any advice as to why or why not would be appreciated.

David




Re: [OMPI users] Pros and cons of --enable-heterogeneous

2010-10-07 Thread Durga Choudhury
I'd like to add to this question the following:

If I compile with --enable-heterogenous flag for different
*architectures* (I have a mix of old 32 bit x86, newer x86_64 and some
Cell BE based boxes (PS3)), would I be able to form a MPD ring between
all these different machines?

Best regards
Durga

On Thu, Oct 7, 2010 at 3:44 PM, David Ronis  wrote:
> I have various boxes that run openmpi and I can't seem to use all of
> them at once because they have different CPU's (e.g., pentiums, athlons
> (both 32 bit) vs Intel I7 (64 bit)).   I'm about the build 1.4.3 and was
> wondering if I should add --enable-heterogenous to the configure flags.
> Any advice as to why or why not would be appreciated.
>
> David
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



Re: [OMPI users] Pros and cons of --enable-heterogeneous

2010-10-07 Thread Ralph Castain
Hetero operations tend to lose a little performance due to the need to
convert data, but otherwise there is no real negative. We don't do it by
default solely because the majority of installations don't need to, and
there is no reason to lose even a little performance if it isn't necessary.

If you want an application to be able to span that mix, then you'll need to
set that configure flag.

On Thu, Oct 7, 2010 at 1:44 PM, David Ronis  wrote:

> I have various boxes that run openmpi and I can't seem to use all of
> them at once because they have different CPU's (e.g., pentiums, athlons
> (both 32 bit) vs Intel I7 (64 bit)).   I'm about the build 1.4.3 and was
> wondering if I should add --enable-heterogenous to the configure flags.
> Any advice as to why or why not would be appreciated.
>
> David
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] Pros and cons of --enable-heterogeneous

2010-10-07 Thread Ralph Castain
The short answer is "yes". It should work.


On Thu, Oct 7, 2010 at 1:53 PM, Durga Choudhury  wrote:

> I'd like to add to this question the following:
>
> If I compile with --enable-heterogenous flag for different
> *architectures* (I have a mix of old 32 bit x86, newer x86_64 and some
> Cell BE based boxes (PS3)), would I be able to form a MPD ring between
> all these different machines?
>
> Best regards
> Durga
>
> On Thu, Oct 7, 2010 at 3:44 PM, David Ronis  wrote:
> > I have various boxes that run openmpi and I can't seem to use all of
> > them at once because they have different CPU's (e.g., pentiums, athlons
> > (both 32 bit) vs Intel I7 (64 bit)).   I'm about the build 1.4.3 and was
> > wondering if I should add --enable-heterogenous to the configure flags.
> > Any advice as to why or why not would be appreciated.
> >
> > David
> >
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


[OMPI users] Pros and cons of --enable-heterogeneous

2010-10-07 Thread David Ronis
I have various boxes that run openmpi and I can't seem to use all of
them at once because they have different CPU's (e.g., pentiums, athlons
(both 32 bit) vs Intel I7 (64 bit)).   I'm about the build 1.4.3 and was
wondering if I should add --enable-heterogenous to the configure flags.
Any advice as to why or why not would be appreciated.

David





Re: [OMPI users] Pros and cons of --enable-heterogeneous

2010-10-07 Thread David Ronis
Ralph, thanks for the reply.   

If I build with enable-heterogeneous and then decide to run on a
homogeneous set of nodes, does the additional "overhead" go away or
become completely negligible; i.e., if no conversion is necessary.

David


On Thu, 2010-10-07 at 15:17 -0600, Ralph Castain wrote:
> Hetero operations tend to lose a little performance due to the need to
> convert data, but otherwise there is no real negative. We don't do it
> by default solely because the majority of installations don't need to,
> and there is no reason to lose even a little performance if it isn't
> necessary.
> 
> 
> If you want an application to be able to span that mix, then you'll
> need to set that configure flag.
> 
> On Thu, Oct 7, 2010 at 1:44 PM, David Ronis 
> wrote:
> I have various boxes that run openmpi and I can't seem to use
> all of
> them at once because they have different CPU's (e.g.,
> pentiums, athlons
> (both 32 bit) vs Intel I7 (64 bit)).   I'm about the build
> 1.4.3 and was
> wondering if I should add --enable-heterogenous to the
> configure flags.
> Any advice as to why or why not would be appreciated.
> 
> David
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 




Re: [OMPI users] memory limits on remote nodes

2010-10-07 Thread Ralph Castain

On Oct 7, 2010, at 2:55 AM, Reuti wrote:

> Am 07.10.2010 um 01:55 schrieb David Turner:
> 
>> Hi,
>> 
>> We would like to set process memory limits (vmemoryuse, in csh
>> terms) on remote processes.  Our batch system is torque/moab.
> 
> Isn't it possible to set this up in torque/moab directly? In SGE I would 
> simply define h_vmem and it's per slot then; and with a tight integration all 
> Open MPI processes will be children of sge_execd and the limit will be 
> enforced.

I could be wrong, but I -think- the issue here is that the soft limits need to 
be set on a per-job basis.

> 
> -- Reuti
> 
> 
>> The nodes of our cluster each have 24GB of physical memory, of
>> which 4GB is taken up by the kernel and the root file system.
>> Note that these are diskless nodes, so no swap either.
>> 
>> We can globally set the per-process limit to 2.5GB.  This works
>> fine if applications run "packed":  8 MPI tasks running on each
>> 8-core node, for an aggregate limit of 20GB.  However, if a job
>> only wants to run 4 tasks, the soft limit can safely be raised
>> to 5GB.  2 tasks, 10GB.  1 task, the full 20GB.
>> 
>> Upping the soft limit in the batch script itself only affects
>> the "head node" of the job.  Since limits are not part of the
>> "environment", I can find no way propagate them to remote nodes.
>> 
>> If I understand how this all works, the remote processes are
>> started by orted, and therefore inherit its limits.  Is there
>> any sort of orted configuration that can help here?  Any other
>> thoughts about how to approach this?
>> 
>> Thanks!
>> 
>> -- 
>> Best regards,
>> 
>> David Turner
>> User Services Groupemail: dptur...@lbl.gov
>> NERSC Division phone: (510) 486-4027
>> Lawrence Berkeley Labfax: (510) 486-4316
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] memory limits on remote nodes

2010-10-07 Thread Ralph Castain

On Oct 6, 2010, at 11:25 PM, David Turner wrote:

> Hi Ralph,
> 
>> There is an MCA param that tells the orted to set its usage limits to the 
>> hard limit:
>> 
>> MCA opal: parameter "opal_set_max_sys_limits" (current 
>> value:<0>, data source: default value)
>>   Set to non-zero to automatically set any 
>> system-imposed limits to the maximum allowed
>> 
>> The orted could be used to set the soft limit down from that value on a 
>> per-job basis, but we didn't provide a mechanism for specifying it. Would be 
>> relatively easy to do, though.
>> 
>> What version are you using? If I create a patch, would you be willing to 
>> test it?
> 
> 1.4.2, with 1.4.1 available, and 1.4.3 waiting in the wings.
> I would love to test any patch you could come up with.
> The ability to set any valid limit to any valid value,
> applied equally to all processes, would go a long way in
> making our environment more stable.  Thanks!

Just to be sure I'm on track here: setting the soft limit will cause the job to 
terminate if any process attempts to use more memory than that limit. This is 
what you want to have happen?

I ask because we have a memory usage monitor in OMPI right now (in the trunk, 
not in 1.4 series) that does exactly what I've described, and the limit can be 
set for each job. So I'm wondering if the answer here is just to suggest you 
try with the trunk and see if it does what you want?


> 
>>> Hi,
>>> 
>>> We would like to set process memory limits (vmemoryuse, in csh
>>> terms) on remote processes.  Our batch system is torque/moab.
>>> 
>>> The nodes of our cluster each have 24GB of physical memory, of
>>> which 4GB is taken up by the kernel and the root file system.
>>> Note that these are diskless nodes, so no swap either.
>>> 
>>> We can globally set the per-process limit to 2.5GB.  This works
>>> fine if applications run "packed":  8 MPI tasks running on each
>>> 8-core node, for an aggregate limit of 20GB.  However, if a job
>>> only wants to run 4 tasks, the soft limit can safely be raised
>>> to 5GB.  2 tasks, 10GB.  1 task, the full 20GB.
>>> 
>>> Upping the soft limit in the batch script itself only affects
>>> the "head node" of the job.  Since limits are not part of the
>>> "environment", I can find no way propagate them to remote nodes.
>>> 
>>> If I understand how this all works, the remote processes are
>>> started by orted, and therefore inherit its limits.  Is there
>>> any sort of orted configuration that can help here?  Any other
>>> thoughts about how to approach this?
>>> 
>>> Thanks!
>>> 
>>> --
>>> Best regards,
>>> 
>>> David Turner
>>> User Services Groupemail: dptur...@lbl.gov
>>> NERSC Division phone: (510) 486-4027
>>> Lawrence Berkeley Labfax: (510) 486-4316
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> -- 
> Best regards,
> 
> David Turner
> User Services Groupemail: dptur...@lbl.gov
> NERSC Division phone: (510) 486-4027
> Lawrence Berkeley Labfax: (510) 486-4316
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Pros and cons of --enable-heterogeneous

2010-10-07 Thread Eugene Loh

David Ronis wrote:

Ralph, thanks for the reply.   


If I build with enable-heterogeneous and then decide to run on a
homogeneous set of nodes, does the additional "overhead" go away or
become completely negligible; i.e., if no conversion is necessary.
 

I'm no expert, but I think the overhead does not go away.  Even if you 
run on a homogeneous set of nodes, a local node does not know that.  It 
prepares a message without knowing if the destination is "same" or 
"different".  (There may be an exception with the sm BTL, which is only 
for processes on the same node and where it it assumed that a node 
comprises homogeneous processors.)


Whether the overhead is significant or negligible is another matter.  A 
subjective matter.  I suppose you could try some tests and judge for 
yourself for your case.