Alex,

Are you calling MPI_Comm_disconnect in the 3 "master" tasks and with the same 
remote communicator ?

I also read the man page again, and MPI_Comm_disconnect does not ensure the 
remote processes have finished or called MPI_Comm_disconnect, so that might not 
be the thing you need.
George, can you please comment on that ?

Cheers,

Gilles

George Bosilca <bosi...@icl.utk.edu> wrote:
>MPI_Comm_disconnect should be a local operation, there is no reason for it to 
>deadlock. I looked at the code and everything is local with the exception of a 
>call to PMIX.FENCE. Can you attach to your deadlocked processes and confirm 
>that they are stopped in the pmix.fence?
>
>
>  George.
>
>
>
>On Sat, Dec 13, 2014 at 8:47 AM, Alex A. Schmidt <a...@ufsm.br> wrote:
>
>Hi
>
>Sorry, I was calling mpi_comm_disconnect on the group comm handler, not
>on the intercomm handler returned from the spawn call as it should be.
>
>Well, calling the disconnect on the intercomm handler does halt the spwaner
>side but the wait is never completed since, as George points out, there is no
>disconnect call being made on the spawnee side.... and that brings me back
>to the beginning of the problem since, being a third party app, that call would
>never be there. I guess an mpi wrapper to deal with that could be made for
>the app, but I fell the wrapper itself, at the end, would face the same problem
>we face right now.
>
>My application is a genetic algorithm code that search optimal configuration
>(minimum or maximum energy) of cluster of atoms. The work flow bottleneck
>is the calculation of the cluster energy. For the cases which an analytical
>potential is available the calculation can be made internally and the workload
>is distributed among slaves nodes from a master node. This is also done
>when an analytical potential is not available and the energy calculation must
>be done externally by a quantum chemistry code like dftb+, siesta and Gaussian.
>So far, we have been running these codes in serial mode. No need to say that
>we could do a lot better if they could be executed in parallel.
>
>I am not familiar with DMRAA but it seems to be the right choice to deal with
>job schedulers as it covers the ones I am interested in (pbs/torque and 
>loadlever).
>
>Alex
>
>
>2014-12-13 7:49 GMT-02:00 Gilles Gouaillardet <gilles.gouaillar...@gmail.com>:
>
>George is right about the semantic
>
>However i am surprised it returns immediatly...
>That should either work or hang imho
>
>The second point is no more mpi related, and is batch manager specific.
>
>You will likely find a submit parameter to make the command block until the 
>job completes. Or you can write your own wrapper.
>Or you can retrieve the jobid and qstat periodically to get the job state.
>If an api is available, this is also an option.
>
>Cheers,
>
>Gilles
>
>George Bosilca <bosi...@icl.utk.edu> wrote:
>You have to call MPI_Comm_disconnect on both sides of the intercommunicator. 
>On the spawner processes you should call it on the intercom, while on the 
>spawnees you should call it on the MPI_Comm_get_parent.
>
>
>  George.
>
>
>On Dec 12, 2014, at 20:43 , Alex A. Schmidt <a...@ufsm.br> wrote:
>
>
>Gilles,
>
>MPI_comm_disconnect seem to work but not quite.
>
>The call to it returns almost immediatly while
>
>the spawn processes keep piling up in the background
>
>until they are all done...
>
>I think system('env -i qsub...') to launch the third party apps
>
>would take the execution of every call back to the scheduler 
>queue. How would I track each one for their completion?
>
>Alex
>
>
>2014-12-12 22:35 GMT-02:00 Gilles Gouaillardet <gilles.gouaillar...@gmail.com>:
>
>Alex,
>
>You need MPI_Comm_disconnect at least.
>I am not sure if this is 100% correct nor working.
>
>If you are using third party apps, why dont you do something like
>system("env -i qsub ...")
>with the right options to make qsub blocking or you manually wait for the end 
>of the job ?
>
>That looks like a much cleaner and simpler approach to me.
>
>Cheers,
>
>Gilles
>
>"Alex A. Schmidt" <a...@ufsm.br> wrote:
>
>Hello Gilles,
>
>Ok, I believe I have a simple toy app running as I think it should:
>'n' parent processes running under mpi_comm_world, each one
>
>spawning its own 'm' child processes (each child group work 
>together nicely, returning the expected result for an mpi_allreduce call).
>
>Now, as I mentioned before, the apps I want to run in the spawned 
>
>processes are third party mpi apps and I don't think it will be possible 
>to exchange messages with them from my app. So, I do I tell 
>when the spawned processes have finnished running? All I have to work
>
>with is the intercommunicator returned from the mpi_comm_spawn call...
>
>
>Alex
>
>
>
>
>
>2014-12-12 2:42 GMT-02:00 Alex A. Schmidt <a...@ufsm.br>:
>
>Gilles,
>
>Well, yes, I guess....
>
>I'll do tests with the real third party apps and let you know.
>
>These are huge quantum chemistry codes (dftb+, siesta and Gaussian)
>
>which greatly benefits from a parallel environment. My code is just
>a front end to use those, but since we have a lot of data to process
>
>it also benefits from a parallel environment. 
>
>
>Alex
>
> 
>
>
>2014-12-12 2:30 GMT-02:00 Gilles Gouaillardet <gilles.gouaillar...@iferc.org>:
>
>Alex,
>
>just to make sure ...
>this is the behavior you expected, right ?
>
>Cheers,
>
>Gilles
>
>
>
>On 2014/12/12 13:27, Alex A. Schmidt wrote:
>
>Gilles, Ok, very nice! When I excute do rank=1,3 call 
>MPI_Comm_spawn('hello_world',' 
>',5,MPI_INFO_NULL,rank,MPI_COMM_WORLD,my_intercomm,MPI_ERRCODES_IGNORE,status) 
>enddo I do get 15 instances of the 'hello_world' app running: 5 for each 
>parent rank 1, 2 and 3. Thanks a lot, Gilles. Best regargs, Alex 2014-12-12 
>1:32 GMT-02:00 Gilles Gouaillardet <gilles.gouaillar...@iferc.org 
>
>: Alex, just ask MPI_Comm_spawn to start (up to) 5 tasks via the maxprocs 
>parameter : int MPI_Comm_spawn(char *command, char *argv[], int maxprocs, 
>MPI_Info info, int root, MPI_Comm comm, MPI_Comm *intercomm, int 
>array_of_errcodes[]) INPUT PARAMETERS maxprocs - maximum number of processes 
>to start (integer, significant only at root) Cheers, Gilles On 2014/12/12 
>12:23, Alex A. Schmidt wrote: Hello Gilles, Thanks for your reply. The "env -i 
>PATH=..." stuff seems to work!!! call system("sh -c 'env -i 
>PATH=/usr/lib64/openmpi/bin:/bin mpirun -n 2 hello_world' ") did produce the 
>expected result with a simple openmi "hello_world" code I wrote. I might be 
>harder though with the real third party app I have in mind. And I realize 
>getting passed over a job scheduler with this approach might not work at 
>all... I have looked at the MPI_Comm_spawn call but I failed to understand how 
>it could help here. For instance, can I use it to launch an mpi app with the 
>option "-n 5" ? Alex 2014-12-12 0:36 GMT-02:00 Gilles Gouaillardet 
><gilles.gouaillar...@iferc.org
>
>: Alex, can you try something like call system(sh -c 'env -i /.../mpirun -np 2 
>/.../app_name') -i start with an empty environment that being said, you might 
>need to set a few environment variables manually : env -i PATH=/bin ... and 
>that being also said, this "trick" could be just a bad idea : you might be 
>using a scheduler, and if you empty the environment, the scheduler will not be 
>aware of the "inside" run. on top of that, invoking system might fail 
>depending on the interconnect you use. Bottom line, i believe Ralph's reply is 
>still valid, even if five years have passed : changing your workflow, or using 
>MPI_Comm_spawn is a much better approach. Cheers, Gilles On 2014/12/12 11:22, 
>Alex A. Schmidt wrote: Dear OpenMPI users, Regarding to this previous 
>post<http://www.open-mpi.org/community/lists/users/2009/06/9560.php> 
><http://www.open-mpi.org/community/lists/users/2009/06/9560.php> 
><http://www.open-mpi.org/community/lists/users/2009/06/9560.php> 
><http://www.open-mpi.org/community/lists/users/2009/06/9560.php> from 2009, I 
>wonder if the reply from Ralph Castain is still valid. My need is similar but 
>quite simpler: to make a system call from an openmpi fortran application to 
>run a third party openmpi application. I don't need to exchange mpi messages 
>with the application. I just need to read the resulting output file generated 
>by it. I have tried to do the following system call from my fortran openmpi 
>code: call system("sh -c 'mpirun -n 2 app_name") but I get 
>********************************************************** Open MPI does not 
>support recursive calls of mpirun 
>********************************************************** Is there a way to 
>make this work? Best regards, Alex 
>_______________________________________________ users mailing 
>listus...@open-mpi.org Subscription: 
>http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: 
>http://www.open-mpi.org/community/lists/users/2014/12/25966.php 
>_______________________________________________ users mailing 
>listus...@open-mpi.org Subscription: 
>http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this 
>post:http://www.open-mpi.org/community/lists/users/2014/12/25967.php 
>_______________________________________________ users mailing 
>listus...@open-mpi.org Subscription: 
>http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: 
>http://www.open-mpi.org/community/lists/users/2014/12/25968.php 
>_______________________________________________ users mailing list 
>us...@open-mpi.org Subscription: 
>http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: 
>http://www.open-mpi.org/community/lists/users/2014/12/25969.php 
>
>
>
>_______________________________________________ users mailing list 
>us...@open-mpi.org Subscription: 
>http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: 
>http://www.open-mpi.org/community/lists/users/2014/12/25970.php 
>
>
>
>_______________________________________________
>users mailing list
>us...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>Link to this post: 
>http://www.open-mpi.org/community/lists/users/2014/12/25971.php
>
>
>_______________________________________________
>users mailing list
>us...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>Link to this post: 
>http://www.open-mpi.org/community/lists/users/2014/12/25974.php
>
>_______________________________________________
>users mailing list
>us...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>Link to this post: 
>http://www.open-mpi.org/community/lists/users/2014/12/25975.php
>
>
>
>_______________________________________________
>users mailing list
>us...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>Link to this post: 
>http://www.open-mpi.org/community/lists/users/2014/12/25978.php
>
>
>_______________________________________________
>users mailing list
>us...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>Link to this post: 
>http://www.open-mpi.org/community/lists/users/2014/12/25979.php
>
>

Reply via email to