Just glancing at the code, I don't see anything tied to 2**12 that pops out at 
me. I suspect the issue is that you are hitting a system limit on the number of 
child processes a process can spawn - this is different from the total number 
of processes allowed on the node - or the number of file descriptors a process 
can have open (we need several per process for I/O forwarding).


On Nov 27, 2012, at 8:24 AM, George Markomanolis <geo...@markomanolis.com> 
wrote:

> Dear Ralph,
> 
> Thanks for the answer, I am using OMPI v1.4.1.
> 
> Best regards,
> George Markomanolis
> 
> On 11/26/2012 05:07 PM, Ralph Castain wrote:
>> What version of OMPI are you using?
>> 
>> On Nov 26, 2012, at 1:02 AM, George Markomanolis <geo...@markomanolis.com> 
>> wrote:
>> 
>>> Dear all,
>>> 
>>> Initially I would like an advice of how to identify the maximum number of 
>>> MPI processes that can be executed on a node with oversubscribing. When I 
>>> try to execute an application with 4096 MPI processes on a 24-cores node 
>>> with 48GB of memory, I have an error "Unknown error: 1" while the memory is 
>>> not even at the half. I can execute the same application with 2048 MPI 
>>> processes in less than one minute. I have checked linux settings about 
>>> maximum number of processes and it is much bigger than 4096.
>>> 
>>> Another more generic question, is about discovering nodes with faulty 
>>> memory. Is there any way to identify nodes with faulty memory? I found 
>>> accidentally that a node with exact the same hardware couldn't execute an 
>>> MPI application when it was using more than 12GB of ram while the second 
>>> one could use all of the 48GB of memory. If I have 500+ nodes is difficult 
>>> to check all of them and I am not familiar with any efficient solution. 
>>> Initially I thought about memtester but it takes a lot of time. I know that 
>>> this does not apply exactly on this mailing list but I thought that maybe 
>>> an OpenMPI user knows something about.
>>> 
>>> 
>>> Best regards,
>>> George Markomanolis
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
> 


Reply via email to