Just glancing at the code, I don't see anything tied to 2**12 that pops out at me. I suspect the issue is that you are hitting a system limit on the number of child processes a process can spawn - this is different from the total number of processes allowed on the node - or the number of file descriptors a process can have open (we need several per process for I/O forwarding).
On Nov 27, 2012, at 8:24 AM, George Markomanolis <geo...@markomanolis.com> wrote: > Dear Ralph, > > Thanks for the answer, I am using OMPI v1.4.1. > > Best regards, > George Markomanolis > > On 11/26/2012 05:07 PM, Ralph Castain wrote: >> What version of OMPI are you using? >> >> On Nov 26, 2012, at 1:02 AM, George Markomanolis <geo...@markomanolis.com> >> wrote: >> >>> Dear all, >>> >>> Initially I would like an advice of how to identify the maximum number of >>> MPI processes that can be executed on a node with oversubscribing. When I >>> try to execute an application with 4096 MPI processes on a 24-cores node >>> with 48GB of memory, I have an error "Unknown error: 1" while the memory is >>> not even at the half. I can execute the same application with 2048 MPI >>> processes in less than one minute. I have checked linux settings about >>> maximum number of processes and it is much bigger than 4096. >>> >>> Another more generic question, is about discovering nodes with faulty >>> memory. Is there any way to identify nodes with faulty memory? I found >>> accidentally that a node with exact the same hardware couldn't execute an >>> MPI application when it was using more than 12GB of ram while the second >>> one could use all of the 48GB of memory. If I have 500+ nodes is difficult >>> to check all of them and I am not familiar with any efficient solution. >>> Initially I thought about memtester but it takes a lot of time. I know that >>> this does not apply exactly on this mailing list but I thought that maybe >>> an OpenMPI user knows something about. >>> >>> >>> Best regards, >>> George Markomanolis >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >