Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Ralph Castain Sat, 17 Apr 2010 22:01:13 -0400

Okay, but here is the problem. If you don't use mpirun, and are not operating 
in an environment we support for "direct" launch (i.e., starting processes 
outside of mpirun), then every one of those processes thinks it is a singleton 
- yes?


What you may not realize is that each singleton immediately fork/exec's an 
orted daemon that is configured to behave just like mpirun. This is required in 
order to support MPI-2 operations such as MPI_Comm_spawn, 
MPI_Comm_connect/accept, etc.

So if you launch 64 processes that think they are singletons, then you have 64 
copies of orted running as well. This eats up a lot of file descriptors, which 
is probably why you are hitting this 65 process limit - your system is probably 
running out of file descriptors. You might check you system limits and see if 
you can get them revised upward.


On Apr 17, 2010, at 4:24 PM, Grzegorz Maj wrote:

> Yes, I know. The problem is that I need to use some special way for
> running my processes provided by the environment in which I'm working
> and unfortunately I can't use mpirun.
> 
> 2010/4/18 Ralph Castain <r...@open-mpi.org>:
>> Guess I don't understand why you can't use mpirun - all it does is start 
>> things, provide a means to forward io, etc. It mainly sits there quietly 
>> without using any cpu unless required to support the job.
>> 
>> Sounds like it would solve your problem. Otherwise, I know of no way to get 
>> all these processes into comm_world.
>> 
>> 
>> On Apr 17, 2010, at 2:27 PM, Grzegorz Maj wrote:
>> 
>>> Hi,
>>> I'd like to dynamically create a group of processes communicating via
>>> MPI. Those processes need to be run without mpirun and create
>>> intracommunicator after the startup. Any ideas how to do this
>>> efficiently?
>>> I came up with a solution in which the processes are connecting one by
>>> one using MPI_Comm_connect, but unfortunately all the processes that
>>> are already in the group need to call MPI_Comm_accept. This means that
>>> when the n-th process wants to connect I need to collect all the n-1
>>> processes on the MPI_Comm_accept call. After I run about 40 processes
>>> every subsequent call takes more and more time, which I'd like to
>>> avoid.
>>> Another problem in this solution is that when I try to connect 66-th
>>> process the root of the existing group segfaults on MPI_Comm_accept.
>>> Maybe it's my bug, but it's weird as everything works fine for at most
>>> 65 processes. Is there any limitation I don't know about?
>>> My last question is about MPI_COMM_WORLD. When I run my processes
>>> without mpirun their MPI_COMM_WORLD is the same as MPI_COMM_SELF. Is
>>> there any way to change MPI_COMM_WORLD and set it to the
>>> intracommunicator that I've created?
>>> 
>>> Thanks,
>>> Grzegorz Maj
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Reply via email to