You should be able to just include that in your argv that you pass to the Comm_spawn API.
On Mon, Dec 15, 2014 at 9:27 AM, Alex A. Schmidt <a...@ufsm.br> wrote: > > George, > > Thanks for the tip. In fact, calling mpi_comm_spawn right away with MPI > _COMM_SELF > has worked for me just as well -- no subgroups needed at all. > > I am testing this openmpi app named "siesta" in parallel. The source code > is available, > so making it "spawn ready" by adding the pair mpi_comm_get_parent + > mpi_comm_disconnect > > into the main code can be done. If it works, maybe the siesta's > developers can be convinced > to add this feature in a future release. > > However, siesta is launched only by specifying input/output files with > i/o redirection like > > mpirun -n <*some number*> siesta < infile > outfile > > So far, I could not find anything about how to set an stdin file for an > spawnee process. > Specifiyng it in a app context file doesn't seem to work. Can it be done? > Maybe through > an MCA parameter? > > Alex > > > > > > 2014-12-15 2:43 GMT-02:00 George Bosilca <bosi...@icl.utk.edu>: >> >> Alex, >> >> The code looks good, and is 100% MPI standard accurate. >> >> I would change the way you create the subcoms in the parent. You do a lot >> of useless operations, as you can achieve exactly the same outcome (one >> communicator per node), either by duplicating MPI_COMM_SELF or doing an >> MPI_Comm_split with the color equal to your rank. >> >> George. >> >> >> On Sun, Dec 14, 2014 at 2:20 AM, Alex A. Schmidt <a...@ufsm.br> wrote: >> >>> Hi, >>> >>> Sorry, guys. I don't think the newbie here can follow any discussion >>> beyond basic mpi... >>> >>> Anyway, if I add the pair >>> >>> call MPI_COMM_GET_PARENT(mpi_comm_parent,ierror) >>> call MPI_COMM_DISCONNECT(mpi_comm_parent,ierror) >>> >>> on the spawnee side I get the proper response in the spawning processes. >>> >>> Please, take a look at the attached toy codes parent.F and child.F >>> I've been playing with. 'mpirun -n 2 parent' seems to work as expected. >>> >>> Alex >>> >>> 2014-12-13 23:46 GMT-02:00 Gilles Gouaillardet < >>> gilles.gouaillar...@gmail.com>: >>>> >>>> Alex, >>>> >>>> Are you calling MPI_Comm_disconnect in the 3 "master" tasks and with >>>> the same remote communicator ? >>>> >>>> I also read the man page again, and MPI_Comm_disconnect does not ensure >>>> the remote processes have finished or called MPI_Comm_disconnect, so that >>>> might not be the thing you need. >>>> George, can you please comment on that ? >>>> >>>> Cheers, >>>> >>>> Gilles >>>> >>>> George Bosilca <bosi...@icl.utk.edu> wrote: >>>> MPI_Comm_disconnect should be a local operation, there is no reason for >>>> it to deadlock. I looked at the code and everything is local with the >>>> exception of a call to PMIX.FENCE. Can you attach to your deadlocked >>>> processes and confirm that they are stopped in the pmix.fence? >>>> >>>> George. >>>> >>>> >>>> On Sat, Dec 13, 2014 at 8:47 AM, Alex A. Schmidt <a...@ufsm.br> wrote: >>>> >>>>> Hi >>>>> >>>>> Sorry, I was calling mpi_comm_disconnect on the group comm handler, not >>>>> on the intercomm handler returned from the spawn call as it should be. >>>>> >>>>> Well, calling the disconnect on the intercomm handler does halt the >>>>> spwaner >>>>> side but the wait is never completed since, as George points out, >>>>> there is no >>>>> disconnect call being made on the spawnee side.... and that brings me >>>>> back >>>>> to the beginning of the problem since, being a third party app, that >>>>> call would >>>>> never be there. I guess an mpi wrapper to deal with that could be made >>>>> for >>>>> the app, but I fell the wrapper itself, at the end, would face the >>>>> same problem >>>>> we face right now. >>>>> >>>>> My application is a genetic algorithm code that search optimal >>>>> configuration >>>>> (minimum or maximum energy) of cluster of atoms. The work flow >>>>> bottleneck >>>>> is the calculation of the cluster energy. For the cases which an >>>>> analytical >>>>> potential is available the calculation can be made internally and the >>>>> workload >>>>> is distributed among slaves nodes from a master node. This is also done >>>>> when an analytical potential is not available and the energy >>>>> calculation must >>>>> be done externally by a quantum chemistry code like dftb+, siesta and >>>>> Gaussian. >>>>> So far, we have been running these codes in serial mode. No need to >>>>> say that >>>>> we could do a lot better if they could be executed in parallel. >>>>> >>>>> I am not familiar with DMRAA but it seems to be the right choice to >>>>> deal with >>>>> job schedulers as it covers the ones I am interested in (pbs/torque >>>>> and loadlever). >>>>> >>>>> Alex >>>>> >>>>> 2014-12-13 7:49 GMT-02:00 Gilles Gouaillardet < >>>>> gilles.gouaillar...@gmail.com>: >>>>>> >>>>>> George is right about the semantic >>>>>> >>>>>> However i am surprised it returns immediatly... >>>>>> That should either work or hang imho >>>>>> >>>>>> The second point is no more mpi related, and is batch manager >>>>>> specific. >>>>>> >>>>>> You will likely find a submit parameter to make the command block >>>>>> until the job completes. Or you can write your own wrapper. >>>>>> Or you can retrieve the jobid and qstat periodically to get the job >>>>>> state. >>>>>> If an api is available, this is also an option. >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Gilles >>>>>> >>>>>> George Bosilca <bosi...@icl.utk.edu> wrote: >>>>>> You have to call MPI_Comm_disconnect on both sides of the >>>>>> intercommunicator. On the spawner processes you should call it on the >>>>>> intercom, while on the spawnees you should call it on the >>>>>> MPI_Comm_get_parent. >>>>>> >>>>>> George. >>>>>> >>>>>> On Dec 12, 2014, at 20:43 , Alex A. Schmidt <a...@ufsm.br> wrote: >>>>>> >>>>>> Gilles, >>>>>> >>>>>> MPI_comm_disconnect seem to work but not quite. >>>>>> The call to it returns almost immediatly while >>>>>> the spawn processes keep piling up in the background >>>>>> until they are all done... >>>>>> >>>>>> I think system('env -i qsub...') to launch the third party apps >>>>>> would take the execution of every call back to the scheduler >>>>>> queue. How would I track each one for their completion? >>>>>> >>>>>> Alex >>>>>> >>>>>> 2014-12-12 22:35 GMT-02:00 Gilles Gouaillardet < >>>>>> gilles.gouaillar...@gmail.com>: >>>>>>> >>>>>>> Alex, >>>>>>> >>>>>>> You need MPI_Comm_disconnect at least. >>>>>>> I am not sure if this is 100% correct nor working. >>>>>>> >>>>>>> If you are using third party apps, why dont you do something like >>>>>>> system("env -i qsub ...") >>>>>>> with the right options to make qsub blocking or you manually wait >>>>>>> for the end of the job ? >>>>>>> >>>>>>> That looks like a much cleaner and simpler approach to me. >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> Gilles >>>>>>> >>>>>>> "Alex A. Schmidt" <a...@ufsm.br> wrote: >>>>>>> Hello Gilles, >>>>>>> >>>>>>> Ok, I believe I have a simple toy app running as I think it should: >>>>>>> 'n' parent processes running under mpi_comm_world, each one >>>>>>> spawning its own 'm' child processes (each child group work >>>>>>> together nicely, returning the expected result for an mpi_allreduce >>>>>>> call). >>>>>>> >>>>>>> Now, as I mentioned before, the apps I want to run in the spawned >>>>>>> processes are third party mpi apps and I don't think it will be >>>>>>> possible >>>>>>> to exchange messages with them from my app. So, I do I tell >>>>>>> when the spawned processes have finnished running? All I have to work >>>>>>> with is the intercommunicator returned from the mpi_comm_spawn >>>>>>> call... >>>>>>> >>>>>>> Alex >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> 2014-12-12 2:42 GMT-02:00 Alex A. Schmidt <a...@ufsm.br>: >>>>>>>> >>>>>>>> Gilles, >>>>>>>> >>>>>>>> Well, yes, I guess.... >>>>>>>> >>>>>>>> I'll do tests with the real third party apps and let you know. >>>>>>>> These are huge quantum chemistry codes (dftb+, siesta and Gaussian) >>>>>>>> which greatly benefits from a parallel environment. My code is just >>>>>>>> a front end to use those, but since we have a lot of data to process >>>>>>>> it also benefits from a parallel environment. >>>>>>>> >>>>>>>> Alex >>>>>>>> >>>>>>>> >>>>>>>> 2014-12-12 2:30 GMT-02:00 Gilles Gouaillardet < >>>>>>>> gilles.gouaillar...@iferc.org>: >>>>>>>>> >>>>>>>>> Alex, >>>>>>>>> >>>>>>>>> just to make sure ... >>>>>>>>> this is the behavior you expected, right ? >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> >>>>>>>>> Gilles >>>>>>>>> >>>>>>>>> >>>>>>>>> On 2014/12/12 13:27, Alex A. Schmidt wrote: >>>>>>>>> >>>>>>>>> Gilles, >>>>>>>>> >>>>>>>>> Ok, very nice! >>>>>>>>> >>>>>>>>> When I excute >>>>>>>>> >>>>>>>>> do rank=1,3 >>>>>>>>> call MPI_Comm_spawn('hello_world',' >>>>>>>>> ',5,MPI_INFO_NULL,rank,MPI_COMM_WORLD,my_intercomm,MPI_ERRCODES_IGNORE,status) >>>>>>>>> enddo >>>>>>>>> >>>>>>>>> I do get 15 instances of the 'hello_world' app running: 5 for each >>>>>>>>> parent >>>>>>>>> rank 1, 2 and 3. >>>>>>>>> >>>>>>>>> Thanks a lot, Gilles. >>>>>>>>> >>>>>>>>> Best regargs, >>>>>>>>> >>>>>>>>> Alex >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> 2014-12-12 1:32 GMT-02:00 Gilles Gouaillardet >>>>>>>>> <gilles.gouaillar...@iferc.org >>>>>>>>> >>>>>>>>> : >>>>>>>>> >>>>>>>>> Alex, >>>>>>>>> >>>>>>>>> just ask MPI_Comm_spawn to start (up to) 5 tasks via the maxprocs >>>>>>>>> parameter : >>>>>>>>> >>>>>>>>> int MPI_Comm_spawn(char *command, char *argv[], int maxprocs, >>>>>>>>> MPI_Info info, >>>>>>>>> int root, MPI_Comm comm, MPI_Comm *intercomm, >>>>>>>>> int array_of_errcodes[]) >>>>>>>>> >>>>>>>>> INPUT PARAMETERS >>>>>>>>> maxprocs >>>>>>>>> - maximum number of processes to start (integer, >>>>>>>>> significant >>>>>>>>> only at root) >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> >>>>>>>>> Gilles >>>>>>>>> >>>>>>>>> >>>>>>>>> On 2014/12/12 12:23, Alex A. Schmidt wrote: >>>>>>>>> >>>>>>>>> Hello Gilles, >>>>>>>>> >>>>>>>>> Thanks for your reply. The "env -i PATH=..." stuff seems to work!!! >>>>>>>>> >>>>>>>>> call system("sh -c 'env -i PATH=/usr/lib64/openmpi/bin:/bin mpirun -n >>>>>>>>> 2 >>>>>>>>> hello_world' ") >>>>>>>>> >>>>>>>>> did produce the expected result with a simple openmi "hello_world" >>>>>>>>> code I >>>>>>>>> wrote. >>>>>>>>> >>>>>>>>> I might be harder though with the real third party app I have in >>>>>>>>> mind. And >>>>>>>>> I realize >>>>>>>>> getting passed over a job scheduler with this approach might not work >>>>>>>>> at >>>>>>>>> all... >>>>>>>>> >>>>>>>>> I have looked at the MPI_Comm_spawn call but I failed to understand >>>>>>>>> how it >>>>>>>>> could help here. For instance, can I use it to launch an mpi app with >>>>>>>>> the >>>>>>>>> option "-n 5" ? >>>>>>>>> >>>>>>>>> Alex >>>>>>>>> >>>>>>>>> 2014-12-12 0:36 GMT-02:00 Gilles Gouaillardet >>>>>>>>> <gilles.gouaillar...@iferc.org >>>>>>>>> >>>>>>>>> >>>>>>>>> : >>>>>>>>> >>>>>>>>> Alex, >>>>>>>>> >>>>>>>>> can you try something like >>>>>>>>> call system(sh -c 'env -i /.../mpirun -np 2 /.../app_name') >>>>>>>>> >>>>>>>>> -i start with an empty environment >>>>>>>>> that being said, you might need to set a few environment variables >>>>>>>>> manually : >>>>>>>>> env -i PATH=/bin ... >>>>>>>>> >>>>>>>>> and that being also said, this "trick" could be just a bad idea : >>>>>>>>> you might be using a scheduler, and if you empty the environment, the >>>>>>>>> scheduler >>>>>>>>> will not be aware of the "inside" run. >>>>>>>>> >>>>>>>>> on top of that, invoking system might fail depending on the >>>>>>>>> interconnect >>>>>>>>> you use. >>>>>>>>> >>>>>>>>> Bottom line, i believe Ralph's reply is still valid, even if five >>>>>>>>> years >>>>>>>>> have passed : >>>>>>>>> changing your workflow, or using MPI_Comm_spawn is a much better >>>>>>>>> approach. >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> >>>>>>>>> Gilles >>>>>>>>> >>>>>>>>> On 2014/12/12 11:22, Alex A. Schmidt wrote: >>>>>>>>> >>>>>>>>> Dear OpenMPI users, >>>>>>>>> >>>>>>>>> Regarding to this previous >>>>>>>>> post<http://www.open-mpi.org/community/lists/users/2009/06/9560.php> >>>>>>>>> <http://www.open-mpi.org/community/lists/users/2009/06/9560.php> >>>>>>>>> <http://www.open-mpi.org/community/lists/users/2009/06/9560.php> >>>>>>>>> <http://www.open-mpi.org/community/lists/users/2009/06/9560.php> >>>>>>>>> <http://www.open-mpi.org/community/lists/users/2009/06/9560.php> >>>>>>>>> <http://www.open-mpi.org/community/lists/users/2009/06/9560.php> >>>>>>>>> <http://www.open-mpi.org/community/lists/users/2009/06/9560.php> >>>>>>>>> <http://www.open-mpi.org/community/lists/users/2009/06/9560.php> from >>>>>>>>> 2009, >>>>>>>>> I wonder if the reply >>>>>>>>> from Ralph Castain is still valid. My need is similar but quite >>>>>>>>> simpler: >>>>>>>>> to make a system call from an openmpi fortran application to run a >>>>>>>>> third party openmpi application. I don't need to exchange mpi messages >>>>>>>>> with the application. I just need to read the resulting output file >>>>>>>>> generated >>>>>>>>> by it. I have tried to do the following system call from my fortran >>>>>>>>> openmpi >>>>>>>>> code: >>>>>>>>> >>>>>>>>> call system("sh -c 'mpirun -n 2 app_name") >>>>>>>>> >>>>>>>>> but I get >>>>>>>>> >>>>>>>>> ********************************************************** >>>>>>>>> >>>>>>>>> Open MPI does not support recursive calls of mpirun >>>>>>>>> >>>>>>>>> ********************************************************** >>>>>>>>> >>>>>>>>> Is there a way to make this work? >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> >>>>>>>>> Alex >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing listus...@open-mpi.org >>>>>>>>> >>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> Link to this post: >>>>>>>>> http://www.open-mpi.org/community/lists/users/2014/12/25966.php >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing listus...@open-mpi.org >>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> Link to this >>>>>>>>> post:http://www.open-mpi.org/community/lists/users/2014/12/25967.php >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing listus...@open-mpi.org >>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> >>>>>>>>> Link to this post: >>>>>>>>> http://www.open-mpi.org/community/lists/users/2014/12/25968.php >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing listus...@open-mpi.org >>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> Link to this >>>>>>>>> post:http://www.open-mpi.org/community/lists/users/2014/12/25969.php >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing listus...@open-mpi.org >>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> Link to this post: >>>>>>>>> http://www.open-mpi.org/community/lists/users/2014/12/25970.php >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> Link to this post: >>>>>>>>> http://www.open-mpi.org/community/lists/users/2014/12/25971.php >>>>>>>>> >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> Link to this post: >>>>>>> http://www.open-mpi.org/community/lists/users/2014/12/25974.php >>>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/users/2014/12/25975.php >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/users/2014/12/25978.php >>>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2014/12/25979.php >>>>> >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2014/12/25981.php >>>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/12/25982.php >>> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/12/25991.php >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/12/26002.php >