Re: [OMPI users] memory limits on remote nodes
Hi Ralph, There is an MCA param that tells the orted to set its usage limits to the hard limit: MCA opal: parameter "opal_set_max_sys_limits" (current value:<0>, data source: default value) Set to non-zero to automatically set any system-imposed limits to the maximum allowed The orted could be used to set the soft limit down from that value on a per-job basis, but we didn't provide a mechanism for specifying it. Would be relatively easy to do, though. What version are you using? If I create a patch, would you be willing to test it? 1.4.2, with 1.4.1 available, and 1.4.3 waiting in the wings. I would love to test any patch you could come up with. The ability to set any valid limit to any valid value, applied equally to all processes, would go a long way in making our environment more stable. Thanks! Hi, We would like to set process memory limits (vmemoryuse, in csh terms) on remote processes. Our batch system is torque/moab. The nodes of our cluster each have 24GB of physical memory, of which 4GB is taken up by the kernel and the root file system. Note that these are diskless nodes, so no swap either. We can globally set the per-process limit to 2.5GB. This works fine if applications run "packed": 8 MPI tasks running on each 8-core node, for an aggregate limit of 20GB. However, if a job only wants to run 4 tasks, the soft limit can safely be raised to 5GB. 2 tasks, 10GB. 1 task, the full 20GB. Upping the soft limit in the batch script itself only affects the "head node" of the job. Since limits are not part of the "environment", I can find no way propagate them to remote nodes. If I understand how this all works, the remote processes are started by orted, and therefore inherit its limits. Is there any sort of orted configuration that can help here? Any other thoughts about how to approach this? Thanks! -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316 ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Best regards, David Turner User Services Groupemail: dptur...@lbl.gov NERSC Division phone: (510) 486-4027 Lawrence Berkeley Labfax: (510) 486-4316
Re: [OMPI users] memory limits on remote nodes
Am 07.10.2010 um 01:55 schrieb David Turner: > Hi, > > We would like to set process memory limits (vmemoryuse, in csh > terms) on remote processes. Our batch system is torque/moab. Isn't it possible to set this up in torque/moab directly? In SGE I would simply define h_vmem and it's per slot then; and with a tight integration all Open MPI processes will be children of sge_execd and the limit will be enforced. -- Reuti > The nodes of our cluster each have 24GB of physical memory, of > which 4GB is taken up by the kernel and the root file system. > Note that these are diskless nodes, so no swap either. > > We can globally set the per-process limit to 2.5GB. This works > fine if applications run "packed": 8 MPI tasks running on each > 8-core node, for an aggregate limit of 20GB. However, if a job > only wants to run 4 tasks, the soft limit can safely be raised > to 5GB. 2 tasks, 10GB. 1 task, the full 20GB. > > Upping the soft limit in the batch script itself only affects > the "head node" of the job. Since limits are not part of the > "environment", I can find no way propagate them to remote nodes. > > If I understand how this all works, the remote processes are > started by orted, and therefore inherit its limits. Is there > any sort of orted configuration that can help here? Any other > thoughts about how to approach this? > > Thanks! > > -- > Best regards, > > David Turner > User Services Groupemail: dptur...@lbl.gov > NERSC Division phone: (510) 486-4027 > Lawrence Berkeley Labfax: (510) 486-4316 > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] Pros and cons of --enable-heterogeneous
I have various boxes that run openmpi and I can't seem to use all of them at once because they have different CPU's (e.g., pentiums, athlons (both 32 bit) vs Intel I7 (64 bit)). I'm about the build 1.4.3 and was wondering if I should add --enable-heterogenous to the configure flags. Any advice as to why or why not would be appreciated. David
Re: [OMPI users] Pros and cons of --enable-heterogeneous
I'd like to add to this question the following: If I compile with --enable-heterogenous flag for different *architectures* (I have a mix of old 32 bit x86, newer x86_64 and some Cell BE based boxes (PS3)), would I be able to form a MPD ring between all these different machines? Best regards Durga On Thu, Oct 7, 2010 at 3:44 PM, David Ronis wrote: > I have various boxes that run openmpi and I can't seem to use all of > them at once because they have different CPU's (e.g., pentiums, athlons > (both 32 bit) vs Intel I7 (64 bit)). I'm about the build 1.4.3 and was > wondering if I should add --enable-heterogenous to the configure flags. > Any advice as to why or why not would be appreciated. > > David > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Pros and cons of --enable-heterogeneous
Hetero operations tend to lose a little performance due to the need to convert data, but otherwise there is no real negative. We don't do it by default solely because the majority of installations don't need to, and there is no reason to lose even a little performance if it isn't necessary. If you want an application to be able to span that mix, then you'll need to set that configure flag. On Thu, Oct 7, 2010 at 1:44 PM, David Ronis wrote: > I have various boxes that run openmpi and I can't seem to use all of > them at once because they have different CPU's (e.g., pentiums, athlons > (both 32 bit) vs Intel I7 (64 bit)). I'm about the build 1.4.3 and was > wondering if I should add --enable-heterogenous to the configure flags. > Any advice as to why or why not would be appreciated. > > David > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Pros and cons of --enable-heterogeneous
The short answer is "yes". It should work. On Thu, Oct 7, 2010 at 1:53 PM, Durga Choudhury wrote: > I'd like to add to this question the following: > > If I compile with --enable-heterogenous flag for different > *architectures* (I have a mix of old 32 bit x86, newer x86_64 and some > Cell BE based boxes (PS3)), would I be able to form a MPD ring between > all these different machines? > > Best regards > Durga > > On Thu, Oct 7, 2010 at 3:44 PM, David Ronis wrote: > > I have various boxes that run openmpi and I can't seem to use all of > > them at once because they have different CPU's (e.g., pentiums, athlons > > (both 32 bit) vs Intel I7 (64 bit)). I'm about the build 1.4.3 and was > > wondering if I should add --enable-heterogenous to the configure flags. > > Any advice as to why or why not would be appreciated. > > > > David > > > > > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
[OMPI users] Pros and cons of --enable-heterogeneous
I have various boxes that run openmpi and I can't seem to use all of them at once because they have different CPU's (e.g., pentiums, athlons (both 32 bit) vs Intel I7 (64 bit)). I'm about the build 1.4.3 and was wondering if I should add --enable-heterogenous to the configure flags. Any advice as to why or why not would be appreciated. David
Re: [OMPI users] Pros and cons of --enable-heterogeneous
Ralph, thanks for the reply. If I build with enable-heterogeneous and then decide to run on a homogeneous set of nodes, does the additional "overhead" go away or become completely negligible; i.e., if no conversion is necessary. David On Thu, 2010-10-07 at 15:17 -0600, Ralph Castain wrote: > Hetero operations tend to lose a little performance due to the need to > convert data, but otherwise there is no real negative. We don't do it > by default solely because the majority of installations don't need to, > and there is no reason to lose even a little performance if it isn't > necessary. > > > If you want an application to be able to span that mix, then you'll > need to set that configure flag. > > On Thu, Oct 7, 2010 at 1:44 PM, David Ronis > wrote: > I have various boxes that run openmpi and I can't seem to use > all of > them at once because they have different CPU's (e.g., > pentiums, athlons > (both 32 bit) vs Intel I7 (64 bit)). I'm about the build > 1.4.3 and was > wondering if I should add --enable-heterogenous to the > configure flags. > Any advice as to why or why not would be appreciated. > > David > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > >
Re: [OMPI users] memory limits on remote nodes
On Oct 7, 2010, at 2:55 AM, Reuti wrote: > Am 07.10.2010 um 01:55 schrieb David Turner: > >> Hi, >> >> We would like to set process memory limits (vmemoryuse, in csh >> terms) on remote processes. Our batch system is torque/moab. > > Isn't it possible to set this up in torque/moab directly? In SGE I would > simply define h_vmem and it's per slot then; and with a tight integration all > Open MPI processes will be children of sge_execd and the limit will be > enforced. I could be wrong, but I -think- the issue here is that the soft limits need to be set on a per-job basis. > > -- Reuti > > >> The nodes of our cluster each have 24GB of physical memory, of >> which 4GB is taken up by the kernel and the root file system. >> Note that these are diskless nodes, so no swap either. >> >> We can globally set the per-process limit to 2.5GB. This works >> fine if applications run "packed": 8 MPI tasks running on each >> 8-core node, for an aggregate limit of 20GB. However, if a job >> only wants to run 4 tasks, the soft limit can safely be raised >> to 5GB. 2 tasks, 10GB. 1 task, the full 20GB. >> >> Upping the soft limit in the batch script itself only affects >> the "head node" of the job. Since limits are not part of the >> "environment", I can find no way propagate them to remote nodes. >> >> If I understand how this all works, the remote processes are >> started by orted, and therefore inherit its limits. Is there >> any sort of orted configuration that can help here? Any other >> thoughts about how to approach this? >> >> Thanks! >> >> -- >> Best regards, >> >> David Turner >> User Services Groupemail: dptur...@lbl.gov >> NERSC Division phone: (510) 486-4027 >> Lawrence Berkeley Labfax: (510) 486-4316 >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] memory limits on remote nodes
On Oct 6, 2010, at 11:25 PM, David Turner wrote: > Hi Ralph, > >> There is an MCA param that tells the orted to set its usage limits to the >> hard limit: >> >> MCA opal: parameter "opal_set_max_sys_limits" (current >> value:<0>, data source: default value) >> Set to non-zero to automatically set any >> system-imposed limits to the maximum allowed >> >> The orted could be used to set the soft limit down from that value on a >> per-job basis, but we didn't provide a mechanism for specifying it. Would be >> relatively easy to do, though. >> >> What version are you using? If I create a patch, would you be willing to >> test it? > > 1.4.2, with 1.4.1 available, and 1.4.3 waiting in the wings. > I would love to test any patch you could come up with. > The ability to set any valid limit to any valid value, > applied equally to all processes, would go a long way in > making our environment more stable. Thanks! Just to be sure I'm on track here: setting the soft limit will cause the job to terminate if any process attempts to use more memory than that limit. This is what you want to have happen? I ask because we have a memory usage monitor in OMPI right now (in the trunk, not in 1.4 series) that does exactly what I've described, and the limit can be set for each job. So I'm wondering if the answer here is just to suggest you try with the trunk and see if it does what you want? > >>> Hi, >>> >>> We would like to set process memory limits (vmemoryuse, in csh >>> terms) on remote processes. Our batch system is torque/moab. >>> >>> The nodes of our cluster each have 24GB of physical memory, of >>> which 4GB is taken up by the kernel and the root file system. >>> Note that these are diskless nodes, so no swap either. >>> >>> We can globally set the per-process limit to 2.5GB. This works >>> fine if applications run "packed": 8 MPI tasks running on each >>> 8-core node, for an aggregate limit of 20GB. However, if a job >>> only wants to run 4 tasks, the soft limit can safely be raised >>> to 5GB. 2 tasks, 10GB. 1 task, the full 20GB. >>> >>> Upping the soft limit in the batch script itself only affects >>> the "head node" of the job. Since limits are not part of the >>> "environment", I can find no way propagate them to remote nodes. >>> >>> If I understand how this all works, the remote processes are >>> started by orted, and therefore inherit its limits. Is there >>> any sort of orted configuration that can help here? Any other >>> thoughts about how to approach this? >>> >>> Thanks! >>> >>> -- >>> Best regards, >>> >>> David Turner >>> User Services Groupemail: dptur...@lbl.gov >>> NERSC Division phone: (510) 486-4027 >>> Lawrence Berkeley Labfax: (510) 486-4316 >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Best regards, > > David Turner > User Services Groupemail: dptur...@lbl.gov > NERSC Division phone: (510) 486-4027 > Lawrence Berkeley Labfax: (510) 486-4316 > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Pros and cons of --enable-heterogeneous
David Ronis wrote: Ralph, thanks for the reply. If I build with enable-heterogeneous and then decide to run on a homogeneous set of nodes, does the additional "overhead" go away or become completely negligible; i.e., if no conversion is necessary. I'm no expert, but I think the overhead does not go away. Even if you run on a homogeneous set of nodes, a local node does not know that. It prepares a message without knowing if the destination is "same" or "different". (There may be an exception with the sm BTL, which is only for processes on the same node and where it it assumed that a node comprises homogeneous processors.) Whether the overhead is significant or negligible is another matter. A subjective matter. I suppose you could try some tests and judge for yourself for your case.