In thinking about this, my proposed solution won't entirely fix the problem - you'll still wind up with all those daemons. I believe I can resolve that one as well, but it would require a patch.
Would you like me to send you something you could try? Might take a couple of iterations to get it right... On Apr 23, 2010, at 12:12 PM, Ralph Castain wrote: > Hmmm....I -think- this will work, but I cannot guarantee it: > > 1. launch one process (can just be a spinner) using mpirun that includes the > following option: > > mpirun -report-uri file > > where file is some filename that mpirun can create and insert its contact > info into it. This can be a relative or absolute path. This process must > remain alive throughout your application - doesn't matter what it does. It's > purpose is solely to keep mpirun alive. > > 2. set OMPI_MCA_dpm_orte_server=FILE:file in your environment, where "file" > is the filename given above. This will tell your processes how to find > mpirun, which is acting as a meeting place to handle the connect/accept > operations > > Now run your processes, and have them connect/accept to each other. > > The reason I cannot guarantee this will work is that these processes will all > have the same rank && name since they all start as singletons. Hence, > connect/accept is likely to fail. > > But it -might- work, so you might want to give it a try. > > On Apr 23, 2010, at 8:10 AM, Grzegorz Maj wrote: > >> To be more precise: by 'server process' I mean some process that I >> could run once on my system and it could help in creating those >> groups. >> My typical scenario is: >> 1. run N separate processes, each without mpirun >> 2. connect them into MPI group >> 3. do some job >> 4. exit all N processes >> 5. goto 1 >> >> 2010/4/23 Grzegorz Maj <ma...@wp.pl>: >>> Thank you Ralph for your explanation. >>> And, apart from that descriptors' issue, is there any other way to >>> solve my problem, i.e. to run separately a number of processes, >>> without mpirun and then to collect them into an MPI intracomm group? >>> If I for example would need to run some 'server process' (even using >>> mpirun) for this task, that's OK. Any ideas? >>> >>> Thanks, >>> Grzegorz Maj >>> >>> >>> 2010/4/18 Ralph Castain <r...@open-mpi.org>: >>>> Okay, but here is the problem. If you don't use mpirun, and are not >>>> operating in an environment we support for "direct" launch (i.e., starting >>>> processes outside of mpirun), then every one of those processes thinks it >>>> is a singleton - yes? >>>> >>>> What you may not realize is that each singleton immediately fork/exec's an >>>> orted daemon that is configured to behave just like mpirun. This is >>>> required in order to support MPI-2 operations such as MPI_Comm_spawn, >>>> MPI_Comm_connect/accept, etc. >>>> >>>> So if you launch 64 processes that think they are singletons, then you >>>> have 64 copies of orted running as well. This eats up a lot of file >>>> descriptors, which is probably why you are hitting this 65 process limit - >>>> your system is probably running out of file descriptors. You might check >>>> you system limits and see if you can get them revised upward. >>>> >>>> >>>> On Apr 17, 2010, at 4:24 PM, Grzegorz Maj wrote: >>>> >>>>> Yes, I know. The problem is that I need to use some special way for >>>>> running my processes provided by the environment in which I'm working >>>>> and unfortunately I can't use mpirun. >>>>> >>>>> 2010/4/18 Ralph Castain <r...@open-mpi.org>: >>>>>> Guess I don't understand why you can't use mpirun - all it does is start >>>>>> things, provide a means to forward io, etc. It mainly sits there quietly >>>>>> without using any cpu unless required to support the job. >>>>>> >>>>>> Sounds like it would solve your problem. Otherwise, I know of no way to >>>>>> get all these processes into comm_world. >>>>>> >>>>>> >>>>>> On Apr 17, 2010, at 2:27 PM, Grzegorz Maj wrote: >>>>>> >>>>>>> Hi, >>>>>>> I'd like to dynamically create a group of processes communicating via >>>>>>> MPI. Those processes need to be run without mpirun and create >>>>>>> intracommunicator after the startup. Any ideas how to do this >>>>>>> efficiently? >>>>>>> I came up with a solution in which the processes are connecting one by >>>>>>> one using MPI_Comm_connect, but unfortunately all the processes that >>>>>>> are already in the group need to call MPI_Comm_accept. This means that >>>>>>> when the n-th process wants to connect I need to collect all the n-1 >>>>>>> processes on the MPI_Comm_accept call. After I run about 40 processes >>>>>>> every subsequent call takes more and more time, which I'd like to >>>>>>> avoid. >>>>>>> Another problem in this solution is that when I try to connect 66-th >>>>>>> process the root of the existing group segfaults on MPI_Comm_accept. >>>>>>> Maybe it's my bug, but it's weird as everything works fine for at most >>>>>>> 65 processes. Is there any limitation I don't know about? >>>>>>> My last question is about MPI_COMM_WORLD. When I run my processes >>>>>>> without mpirun their MPI_COMM_WORLD is the same as MPI_COMM_SELF. Is >>>>>>> there any way to change MPI_COMM_WORLD and set it to the >>>>>>> intracommunicator that I've created? >>>>>>> >>>>>>> Thanks, >>>>>>> Grzegorz Maj >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> >>> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >