Re: [OMPI users] job running out of memory

2014-11-24 Thread Jerry Mersel
: Thursday, November 20, 2014 5:42 PM To: Open MPI Users Subject: Re: [OMPI users] job running out of memory It wouldn’t be maffinity - that just tells the OS on a single node to ensure that the memory is local to the process. If you have a managed environment (i.e., there is a scheduler running tha

Re: [OMPI users] job running out of memory

2014-11-20 Thread Ralph Castain
no other way." > —John Holt > > > From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain > Sent: Tuesday, November 18, 2014 5:56 PM > To: Open MPI Users > Subject: Re: [OMPI users] job running out of memory > > Unfortunately, there is no

Re: [OMPI users] job running out of memory

2014-11-20 Thread Jerry Mersel
shortcomings...” --Reb Elimelech of Lizhensk "We learn something by doing it. There is no other way." —John Holt From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain Sent: Tuesday, November 18, 2014 5:56 PM To: Open MPI Users Subject: Re: [OMPI users] job running out

Re: [OMPI users] job running out of memory

2014-11-18 Thread Jerry Mersel
en-mpi.org] On Behalf Of Ralph Castain Sent: Tuesday, November 18, 2014 5:56 PM To: Open MPI Users Subject: Re: [OMPI users] job running out of memory Unfortunately, there is no way to share memory across nodes. Running out of memory as you describe can be due to several factors, including most typic

Re: [OMPI users] job running out of memory

2014-11-18 Thread Ralph Castain
Unfortunately, there is no way to share memory across nodes. Running out of memory as you describe can be due to several factors, including most typically: * a memory leak in the application, or the application simply growing too big for the environment * one rank running slow, causing it to buil

[OMPI users] job running out of memory

2014-11-18 Thread Jerry Mersel
Hi all: I am running openmpi 1.6.5 and a job which is memory intensive. The job runs on 7 hosts using 16 core on each. On one of the hosts the memory is exhausted so the kernel starts to Kill the processes. It could be that there is plenty of free memory on one of the other hosts. Is