Alex, The code looks good, and is 100% MPI standard accurate.
I would change the way you create the subcoms in the parent. You do a lot of useless operations, as you can achieve exactly the same outcome (one communicator per node), either by duplicating MPI_COMM_SELF or doing an MPI_Comm_split with the color equal to your rank. George. On Sun, Dec 14, 2014 at 2:20 AM, Alex A. Schmidt <a...@ufsm.br> wrote: > Hi, > > Sorry, guys. I don't think the newbie here can follow any discussion > beyond basic mpi... > > Anyway, if I add the pair > > call MPI_COMM_GET_PARENT(mpi_comm_parent,ierror) > call MPI_COMM_DISCONNECT(mpi_comm_parent,ierror) > > on the spawnee side I get the proper response in the spawning processes. > > Please, take a look at the attached toy codes parent.F and child.F > I've been playing with. 'mpirun -n 2 parent' seems to work as expected. > > Alex > > 2014-12-13 23:46 GMT-02:00 Gilles Gouaillardet < > gilles.gouaillar...@gmail.com>: >> >> Alex, >> >> Are you calling MPI_Comm_disconnect in the 3 "master" tasks and with the >> same remote communicator ? >> >> I also read the man page again, and MPI_Comm_disconnect does not ensure >> the remote processes have finished or called MPI_Comm_disconnect, so that >> might not be the thing you need. >> George, can you please comment on that ? >> >> Cheers, >> >> Gilles >> >> George Bosilca <bosi...@icl.utk.edu> wrote: >> MPI_Comm_disconnect should be a local operation, there is no reason for >> it to deadlock. I looked at the code and everything is local with the >> exception of a call to PMIX.FENCE. Can you attach to your deadlocked >> processes and confirm that they are stopped in the pmix.fence? >> >> George. >> >> >> On Sat, Dec 13, 2014 at 8:47 AM, Alex A. Schmidt <a...@ufsm.br> wrote: >> >>> Hi >>> >>> Sorry, I was calling mpi_comm_disconnect on the group comm handler, not >>> on the intercomm handler returned from the spawn call as it should be. >>> >>> Well, calling the disconnect on the intercomm handler does halt the >>> spwaner >>> side but the wait is never completed since, as George points out, there >>> is no >>> disconnect call being made on the spawnee side.... and that brings me >>> back >>> to the beginning of the problem since, being a third party app, that >>> call would >>> never be there. I guess an mpi wrapper to deal with that could be made >>> for >>> the app, but I fell the wrapper itself, at the end, would face the same >>> problem >>> we face right now. >>> >>> My application is a genetic algorithm code that search optimal >>> configuration >>> (minimum or maximum energy) of cluster of atoms. The work flow bottleneck >>> is the calculation of the cluster energy. For the cases which an >>> analytical >>> potential is available the calculation can be made internally and the >>> workload >>> is distributed among slaves nodes from a master node. This is also done >>> when an analytical potential is not available and the energy calculation >>> must >>> be done externally by a quantum chemistry code like dftb+, siesta and >>> Gaussian. >>> So far, we have been running these codes in serial mode. No need to say >>> that >>> we could do a lot better if they could be executed in parallel. >>> >>> I am not familiar with DMRAA but it seems to be the right choice to deal >>> with >>> job schedulers as it covers the ones I am interested in (pbs/torque and >>> loadlever). >>> >>> Alex >>> >>> 2014-12-13 7:49 GMT-02:00 Gilles Gouaillardet < >>> gilles.gouaillar...@gmail.com>: >>>> >>>> George is right about the semantic >>>> >>>> However i am surprised it returns immediatly... >>>> That should either work or hang imho >>>> >>>> The second point is no more mpi related, and is batch manager specific. >>>> >>>> You will likely find a submit parameter to make the command block until >>>> the job completes. Or you can write your own wrapper. >>>> Or you can retrieve the jobid and qstat periodically to get the job >>>> state. >>>> If an api is available, this is also an option. >>>> >>>> Cheers, >>>> >>>> Gilles >>>> >>>> George Bosilca <bosi...@icl.utk.edu> wrote: >>>> You have to call MPI_Comm_disconnect on both sides of the >>>> intercommunicator. On the spawner processes you should call it on the >>>> intercom, while on the spawnees you should call it on the >>>> MPI_Comm_get_parent. >>>> >>>> George. >>>> >>>> On Dec 12, 2014, at 20:43 , Alex A. Schmidt <a...@ufsm.br> wrote: >>>> >>>> Gilles, >>>> >>>> MPI_comm_disconnect seem to work but not quite. >>>> The call to it returns almost immediatly while >>>> the spawn processes keep piling up in the background >>>> until they are all done... >>>> >>>> I think system('env -i qsub...') to launch the third party apps >>>> would take the execution of every call back to the scheduler >>>> queue. How would I track each one for their completion? >>>> >>>> Alex >>>> >>>> 2014-12-12 22:35 GMT-02:00 Gilles Gouaillardet < >>>> gilles.gouaillar...@gmail.com>: >>>>> >>>>> Alex, >>>>> >>>>> You need MPI_Comm_disconnect at least. >>>>> I am not sure if this is 100% correct nor working. >>>>> >>>>> If you are using third party apps, why dont you do something like >>>>> system("env -i qsub ...") >>>>> with the right options to make qsub blocking or you manually wait for >>>>> the end of the job ? >>>>> >>>>> That looks like a much cleaner and simpler approach to me. >>>>> >>>>> Cheers, >>>>> >>>>> Gilles >>>>> >>>>> "Alex A. Schmidt" <a...@ufsm.br> wrote: >>>>> Hello Gilles, >>>>> >>>>> Ok, I believe I have a simple toy app running as I think it should: >>>>> 'n' parent processes running under mpi_comm_world, each one >>>>> spawning its own 'm' child processes (each child group work >>>>> together nicely, returning the expected result for an mpi_allreduce >>>>> call). >>>>> >>>>> Now, as I mentioned before, the apps I want to run in the spawned >>>>> processes are third party mpi apps and I don't think it will be >>>>> possible >>>>> to exchange messages with them from my app. So, I do I tell >>>>> when the spawned processes have finnished running? All I have to work >>>>> with is the intercommunicator returned from the mpi_comm_spawn call... >>>>> >>>>> Alex >>>>> >>>>> >>>>> >>>>> >>>>> 2014-12-12 2:42 GMT-02:00 Alex A. Schmidt <a...@ufsm.br>: >>>>>> >>>>>> Gilles, >>>>>> >>>>>> Well, yes, I guess.... >>>>>> >>>>>> I'll do tests with the real third party apps and let you know. >>>>>> These are huge quantum chemistry codes (dftb+, siesta and Gaussian) >>>>>> which greatly benefits from a parallel environment. My code is just >>>>>> a front end to use those, but since we have a lot of data to process >>>>>> it also benefits from a parallel environment. >>>>>> >>>>>> Alex >>>>>> >>>>>> >>>>>> 2014-12-12 2:30 GMT-02:00 Gilles Gouaillardet < >>>>>> gilles.gouaillar...@iferc.org>: >>>>>>> >>>>>>> Alex, >>>>>>> >>>>>>> just to make sure ... >>>>>>> this is the behavior you expected, right ? >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> Gilles >>>>>>> >>>>>>> >>>>>>> On 2014/12/12 13:27, Alex A. Schmidt wrote: >>>>>>> >>>>>>> Gilles, >>>>>>> >>>>>>> Ok, very nice! >>>>>>> >>>>>>> When I excute >>>>>>> >>>>>>> do rank=1,3 >>>>>>> call MPI_Comm_spawn('hello_world',' >>>>>>> ',5,MPI_INFO_NULL,rank,MPI_COMM_WORLD,my_intercomm,MPI_ERRCODES_IGNORE,status) >>>>>>> enddo >>>>>>> >>>>>>> I do get 15 instances of the 'hello_world' app running: 5 for each >>>>>>> parent >>>>>>> rank 1, 2 and 3. >>>>>>> >>>>>>> Thanks a lot, Gilles. >>>>>>> >>>>>>> Best regargs, >>>>>>> >>>>>>> Alex >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> 2014-12-12 1:32 GMT-02:00 Gilles Gouaillardet >>>>>>> <gilles.gouaillar...@iferc.org >>>>>>> >>>>>>> : >>>>>>> >>>>>>> Alex, >>>>>>> >>>>>>> just ask MPI_Comm_spawn to start (up to) 5 tasks via the maxprocs >>>>>>> parameter : >>>>>>> >>>>>>> int MPI_Comm_spawn(char *command, char *argv[], int maxprocs, >>>>>>> MPI_Info info, >>>>>>> int root, MPI_Comm comm, MPI_Comm *intercomm, >>>>>>> int array_of_errcodes[]) >>>>>>> >>>>>>> INPUT PARAMETERS >>>>>>> maxprocs >>>>>>> - maximum number of processes to start (integer, >>>>>>> significant >>>>>>> only at root) >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> Gilles >>>>>>> >>>>>>> >>>>>>> On 2014/12/12 12:23, Alex A. Schmidt wrote: >>>>>>> >>>>>>> Hello Gilles, >>>>>>> >>>>>>> Thanks for your reply. The "env -i PATH=..." stuff seems to work!!! >>>>>>> >>>>>>> call system("sh -c 'env -i PATH=/usr/lib64/openmpi/bin:/bin mpirun -n 2 >>>>>>> hello_world' ") >>>>>>> >>>>>>> did produce the expected result with a simple openmi "hello_world" code >>>>>>> I >>>>>>> wrote. >>>>>>> >>>>>>> I might be harder though with the real third party app I have in mind. >>>>>>> And >>>>>>> I realize >>>>>>> getting passed over a job scheduler with this approach might not work at >>>>>>> all... >>>>>>> >>>>>>> I have looked at the MPI_Comm_spawn call but I failed to understand how >>>>>>> it >>>>>>> could help here. For instance, can I use it to launch an mpi app with >>>>>>> the >>>>>>> option "-n 5" ? >>>>>>> >>>>>>> Alex >>>>>>> >>>>>>> 2014-12-12 0:36 GMT-02:00 Gilles Gouaillardet >>>>>>> <gilles.gouaillar...@iferc.org >>>>>>> >>>>>>> >>>>>>> : >>>>>>> >>>>>>> Alex, >>>>>>> >>>>>>> can you try something like >>>>>>> call system(sh -c 'env -i /.../mpirun -np 2 /.../app_name') >>>>>>> >>>>>>> -i start with an empty environment >>>>>>> that being said, you might need to set a few environment variables >>>>>>> manually : >>>>>>> env -i PATH=/bin ... >>>>>>> >>>>>>> and that being also said, this "trick" could be just a bad idea : >>>>>>> you might be using a scheduler, and if you empty the environment, the >>>>>>> scheduler >>>>>>> will not be aware of the "inside" run. >>>>>>> >>>>>>> on top of that, invoking system might fail depending on the interconnect >>>>>>> you use. >>>>>>> >>>>>>> Bottom line, i believe Ralph's reply is still valid, even if five years >>>>>>> have passed : >>>>>>> changing your workflow, or using MPI_Comm_spawn is a much better >>>>>>> approach. >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> Gilles >>>>>>> >>>>>>> On 2014/12/12 11:22, Alex A. Schmidt wrote: >>>>>>> >>>>>>> Dear OpenMPI users, >>>>>>> >>>>>>> Regarding to this previous >>>>>>> post<http://www.open-mpi.org/community/lists/users/2009/06/9560.php> >>>>>>> <http://www.open-mpi.org/community/lists/users/2009/06/9560.php> >>>>>>> <http://www.open-mpi.org/community/lists/users/2009/06/9560.php> >>>>>>> <http://www.open-mpi.org/community/lists/users/2009/06/9560.php> >>>>>>> <http://www.open-mpi.org/community/lists/users/2009/06/9560.php> >>>>>>> <http://www.open-mpi.org/community/lists/users/2009/06/9560.php> >>>>>>> <http://www.open-mpi.org/community/lists/users/2009/06/9560.php> >>>>>>> <http://www.open-mpi.org/community/lists/users/2009/06/9560.php> from >>>>>>> 2009, >>>>>>> I wonder if the reply >>>>>>> from Ralph Castain is still valid. My need is similar but quite simpler: >>>>>>> to make a system call from an openmpi fortran application to run a >>>>>>> third party openmpi application. I don't need to exchange mpi messages >>>>>>> with the application. I just need to read the resulting output file >>>>>>> generated >>>>>>> by it. I have tried to do the following system call from my fortran >>>>>>> openmpi >>>>>>> code: >>>>>>> >>>>>>> call system("sh -c 'mpirun -n 2 app_name") >>>>>>> >>>>>>> but I get >>>>>>> >>>>>>> ********************************************************** >>>>>>> >>>>>>> Open MPI does not support recursive calls of mpirun >>>>>>> >>>>>>> ********************************************************** >>>>>>> >>>>>>> Is there a way to make this work? >>>>>>> >>>>>>> Best regards, >>>>>>> >>>>>>> Alex >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing listus...@open-mpi.org >>>>>>> >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> Link to this post: >>>>>>> http://www.open-mpi.org/community/lists/users/2014/12/25966.php >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing listus...@open-mpi.org >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> Link to this >>>>>>> post:http://www.open-mpi.org/community/lists/users/2014/12/25967.php >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing listus...@open-mpi.org >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> >>>>>>> Link to this post: >>>>>>> http://www.open-mpi.org/community/lists/users/2014/12/25968.php >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing listus...@open-mpi.org >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> Link to this >>>>>>> post:http://www.open-mpi.org/community/lists/users/2014/12/25969.php >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing listus...@open-mpi.org >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> Link to this post: >>>>>>> http://www.open-mpi.org/community/lists/users/2014/12/25970.php >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> Link to this post: >>>>>>> http://www.open-mpi.org/community/lists/users/2014/12/25971.php >>>>>>> >>>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2014/12/25974.php >>>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2014/12/25975.php >>>> >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2014/12/25978.php >>>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/12/25979.php >>> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/12/25981.php >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/12/25982.php >