Alex,

The code looks good, and is 100% MPI standard accurate.

I would change the way you create the subcoms in the parent. You do a lot
of useless operations, as you can achieve exactly the same outcome (one
communicator per node), either by duplicating MPI_COMM_SELF or doing an
MPI_Comm_split with the color equal to your rank.

  George.


On Sun, Dec 14, 2014 at 2:20 AM, Alex A. Schmidt <a...@ufsm.br> wrote:

> Hi,
>
> Sorry, guys. I don't think the newbie here can follow any discussion
> beyond basic mpi...
>
> Anyway, if I add the pair
>
> call MPI_COMM_GET_PARENT(mpi_comm_parent,ierror)
> call MPI_COMM_DISCONNECT(mpi_comm_parent,ierror)
>
> on the spawnee side I get the proper response in the spawning processes.
>
> Please, take a look at the attached toy codes parent.F and child.F
> I've been playing with. 'mpirun -n 2 parent' seems to work as expected.
>
> Alex
>
> 2014-12-13 23:46 GMT-02:00 Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com>:
>>
>> Alex,
>>
>> Are you calling MPI_Comm_disconnect in the 3 "master" tasks and with the
>> same remote communicator ?
>>
>> I also read the man page again, and MPI_Comm_disconnect does not ensure
>> the remote processes have finished or called MPI_Comm_disconnect, so that
>> might not be the thing you need.
>> George, can you please comment on that ?
>>
>> Cheers,
>>
>> Gilles
>>
>> George Bosilca <bosi...@icl.utk.edu> wrote:
>> MPI_Comm_disconnect should be a local operation, there is no reason for
>> it to deadlock. I looked at the code and everything is local with the
>> exception of a call to PMIX.FENCE. Can you attach to your deadlocked
>> processes and confirm that they are stopped in the pmix.fence?
>>
>>   George.
>>
>>
>> On Sat, Dec 13, 2014 at 8:47 AM, Alex A. Schmidt <a...@ufsm.br> wrote:
>>
>>> Hi
>>>
>>> Sorry, I was calling mpi_comm_disconnect on the group comm handler, not
>>> on the intercomm handler returned from the spawn call as it should be.
>>>
>>> Well, calling the disconnect on the intercomm handler does halt the
>>> spwaner
>>> side but the wait is never completed since, as George points out, there
>>> is no
>>> disconnect call being made on the spawnee side.... and that brings me
>>> back
>>> to the beginning of the problem since, being a third party app, that
>>> call would
>>> never be there. I guess an mpi wrapper to deal with that could be made
>>> for
>>> the app, but I fell the wrapper itself, at the end, would face the same
>>> problem
>>> we face right now.
>>>
>>> My application is a genetic algorithm code that search optimal
>>> configuration
>>> (minimum or maximum energy) of cluster of atoms. The work flow bottleneck
>>> is the calculation of the cluster energy. For the cases which an
>>> analytical
>>> potential is available the calculation can be made internally and the
>>> workload
>>> is distributed among slaves nodes from a master node. This is also done
>>> when an analytical potential is not available and the energy calculation
>>> must
>>> be done externally by a quantum chemistry code like dftb+, siesta and
>>> Gaussian.
>>> So far, we have been running these codes in serial mode. No need to say
>>> that
>>> we could do a lot better if they could be executed in parallel.
>>>
>>> I am not familiar with DMRAA but it seems to be the right choice to deal
>>> with
>>> job schedulers as it covers the ones I am interested in (pbs/torque and
>>> loadlever).
>>>
>>> Alex
>>>
>>> 2014-12-13 7:49 GMT-02:00 Gilles Gouaillardet <
>>> gilles.gouaillar...@gmail.com>:
>>>>
>>>> George is right about the semantic
>>>>
>>>> However i am surprised it returns immediatly...
>>>> That should either work or hang imho
>>>>
>>>> The second point is no more mpi related, and is batch manager specific.
>>>>
>>>> You will likely find a submit parameter to make the command block until
>>>> the job completes. Or you can write your own wrapper.
>>>> Or you can retrieve the jobid and qstat periodically to get the job
>>>> state.
>>>> If an api is available, this is also an option.
>>>>
>>>> Cheers,
>>>>
>>>> Gilles
>>>>
>>>> George Bosilca <bosi...@icl.utk.edu> wrote:
>>>> You have to call MPI_Comm_disconnect on both sides of the
>>>> intercommunicator. On the spawner processes you should call it on the
>>>> intercom, while on the spawnees you should call it on the
>>>> MPI_Comm_get_parent.
>>>>
>>>>   George.
>>>>
>>>> On Dec 12, 2014, at 20:43 , Alex A. Schmidt <a...@ufsm.br> wrote:
>>>>
>>>> Gilles,
>>>>
>>>> MPI_comm_disconnect seem to work but not quite.
>>>> The call to it returns almost immediatly while
>>>> the spawn processes keep piling up in the background
>>>> until they are all done...
>>>>
>>>> I think system('env -i qsub...') to launch the third party apps
>>>> would take the execution of every call back to the scheduler
>>>> queue. How would I track each one for their completion?
>>>>
>>>> Alex
>>>>
>>>> 2014-12-12 22:35 GMT-02:00 Gilles Gouaillardet <
>>>> gilles.gouaillar...@gmail.com>:
>>>>>
>>>>> Alex,
>>>>>
>>>>> You need MPI_Comm_disconnect at least.
>>>>> I am not sure if this is 100% correct nor working.
>>>>>
>>>>> If you are using third party apps, why dont you do something like
>>>>> system("env -i qsub ...")
>>>>> with the right options to make qsub blocking or you manually wait for
>>>>> the end of the job ?
>>>>>
>>>>> That looks like a much cleaner and simpler approach to me.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Gilles
>>>>>
>>>>> "Alex A. Schmidt" <a...@ufsm.br> wrote:
>>>>> Hello Gilles,
>>>>>
>>>>> Ok, I believe I have a simple toy app running as I think it should:
>>>>> 'n' parent processes running under mpi_comm_world, each one
>>>>> spawning its own 'm' child processes (each child group work
>>>>> together nicely, returning the expected result for an mpi_allreduce
>>>>> call).
>>>>>
>>>>> Now, as I mentioned before, the apps I want to run in the spawned
>>>>> processes are third party mpi apps and I don't think it will be
>>>>> possible
>>>>> to exchange messages with them from my app. So, I do I tell
>>>>> when the spawned processes have finnished running? All I have to work
>>>>> with is the intercommunicator returned from the mpi_comm_spawn call...
>>>>>
>>>>> Alex
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2014-12-12 2:42 GMT-02:00 Alex A. Schmidt <a...@ufsm.br>:
>>>>>>
>>>>>> Gilles,
>>>>>>
>>>>>> Well, yes, I guess....
>>>>>>
>>>>>> I'll do tests with the real third party apps and let you know.
>>>>>> These are huge quantum chemistry codes (dftb+, siesta and Gaussian)
>>>>>> which greatly benefits from a parallel environment. My code is just
>>>>>> a front end to use those, but since we have a lot of data to process
>>>>>> it also benefits from a parallel environment.
>>>>>>
>>>>>> Alex
>>>>>>
>>>>>>
>>>>>> 2014-12-12 2:30 GMT-02:00 Gilles Gouaillardet <
>>>>>> gilles.gouaillar...@iferc.org>:
>>>>>>>
>>>>>>>  Alex,
>>>>>>>
>>>>>>> just to make sure ...
>>>>>>> this is the behavior you expected, right ?
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Gilles
>>>>>>>
>>>>>>>
>>>>>>> On 2014/12/12 13:27, Alex A. Schmidt wrote:
>>>>>>>
>>>>>>> Gilles,
>>>>>>>
>>>>>>> Ok, very nice!
>>>>>>>
>>>>>>> When I excute
>>>>>>>
>>>>>>> do rank=1,3
>>>>>>>     call  MPI_Comm_spawn('hello_world','
>>>>>>> ',5,MPI_INFO_NULL,rank,MPI_COMM_WORLD,my_intercomm,MPI_ERRCODES_IGNORE,status)
>>>>>>> enddo
>>>>>>>
>>>>>>> I do get 15 instances of the 'hello_world' app running: 5 for each 
>>>>>>> parent
>>>>>>> rank 1, 2 and 3.
>>>>>>>
>>>>>>> Thanks a lot, Gilles.
>>>>>>>
>>>>>>> Best regargs,
>>>>>>>
>>>>>>> Alex
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2014-12-12 1:32 GMT-02:00 Gilles Gouaillardet 
>>>>>>> <gilles.gouaillar...@iferc.org
>>>>>>>
>>>>>>>  :
>>>>>>>
>>>>>>>  Alex,
>>>>>>>
>>>>>>> just ask MPI_Comm_spawn to start (up to) 5 tasks via the maxprocs
>>>>>>> parameter :
>>>>>>>
>>>>>>>        int MPI_Comm_spawn(char *command, char *argv[], int maxprocs,
>>>>>>> MPI_Info info,
>>>>>>>                          int root, MPI_Comm comm, MPI_Comm *intercomm,
>>>>>>>                          int array_of_errcodes[])
>>>>>>>
>>>>>>> INPUT PARAMETERS
>>>>>>>        maxprocs
>>>>>>>               - maximum number of processes to start (integer, 
>>>>>>> significant
>>>>>>> only at root)
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Gilles
>>>>>>>
>>>>>>>
>>>>>>> On 2014/12/12 12:23, Alex A. Schmidt wrote:
>>>>>>>
>>>>>>> Hello Gilles,
>>>>>>>
>>>>>>> Thanks for your reply. The "env -i PATH=..." stuff seems to work!!!
>>>>>>>
>>>>>>> call system("sh -c 'env -i PATH=/usr/lib64/openmpi/bin:/bin mpirun -n 2
>>>>>>> hello_world' ")
>>>>>>>
>>>>>>> did produce the expected result with a simple openmi "hello_world" code 
>>>>>>> I
>>>>>>> wrote.
>>>>>>>
>>>>>>> I might be harder though with the real third party app I have in mind. 
>>>>>>> And
>>>>>>> I realize
>>>>>>> getting passed over a job scheduler with this approach might not work at
>>>>>>> all...
>>>>>>>
>>>>>>> I have looked at the MPI_Comm_spawn call but I failed to understand how 
>>>>>>> it
>>>>>>> could help here. For instance, can I use it to launch an mpi app with 
>>>>>>> the
>>>>>>> option "-n 5" ?
>>>>>>>
>>>>>>> Alex
>>>>>>>
>>>>>>> 2014-12-12 0:36 GMT-02:00 Gilles Gouaillardet 
>>>>>>> <gilles.gouaillar...@iferc.org
>>>>>>>
>>>>>>>
>>>>>>>  :
>>>>>>>
>>>>>>>  Alex,
>>>>>>>
>>>>>>> can you try something like
>>>>>>> call system(sh -c 'env -i /.../mpirun -np 2 /.../app_name')
>>>>>>>
>>>>>>> -i start with an empty environment
>>>>>>> that being said, you might need to set a few environment variables
>>>>>>> manually :
>>>>>>> env -i PATH=/bin ...
>>>>>>>
>>>>>>> and that being also said, this "trick" could be just a bad idea :
>>>>>>> you might be using a scheduler, and if you empty the environment, the
>>>>>>> scheduler
>>>>>>> will not be aware of the "inside" run.
>>>>>>>
>>>>>>> on top of that, invoking system might fail depending on the interconnect
>>>>>>> you use.
>>>>>>>
>>>>>>> Bottom line, i believe Ralph's reply is still valid, even if five years
>>>>>>> have passed :
>>>>>>> changing your workflow, or using MPI_Comm_spawn is a much better 
>>>>>>> approach.
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Gilles
>>>>>>>
>>>>>>> On 2014/12/12 11:22, Alex A. Schmidt wrote:
>>>>>>>
>>>>>>> Dear OpenMPI users,
>>>>>>>
>>>>>>> Regarding to this previous 
>>>>>>> post<http://www.open-mpi.org/community/lists/users/2009/06/9560.php> 
>>>>>>> <http://www.open-mpi.org/community/lists/users/2009/06/9560.php> 
>>>>>>> <http://www.open-mpi.org/community/lists/users/2009/06/9560.php> 
>>>>>>> <http://www.open-mpi.org/community/lists/users/2009/06/9560.php> 
>>>>>>> <http://www.open-mpi.org/community/lists/users/2009/06/9560.php> 
>>>>>>> <http://www.open-mpi.org/community/lists/users/2009/06/9560.php> 
>>>>>>> <http://www.open-mpi.org/community/lists/users/2009/06/9560.php> 
>>>>>>> <http://www.open-mpi.org/community/lists/users/2009/06/9560.php> from 
>>>>>>> 2009,
>>>>>>> I wonder if the reply
>>>>>>> from Ralph Castain is still valid. My need is similar but quite simpler:
>>>>>>> to make a system call from an openmpi fortran application to run a
>>>>>>> third party openmpi application. I don't need to exchange mpi messages
>>>>>>> with the application. I just need to read the resulting output file
>>>>>>> generated
>>>>>>> by it. I have tried to do the following system call from my fortran 
>>>>>>> openmpi
>>>>>>> code:
>>>>>>>
>>>>>>> call system("sh -c 'mpirun -n 2 app_name")
>>>>>>>
>>>>>>> but I get
>>>>>>>
>>>>>>> **********************************************************
>>>>>>>
>>>>>>> Open MPI does not support recursive calls of mpirun
>>>>>>>
>>>>>>> **********************************************************
>>>>>>>
>>>>>>> Is there a way to make this work?
>>>>>>>
>>>>>>> Best regards,
>>>>>>>
>>>>>>> Alex
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing listus...@open-mpi.org
>>>>>>>
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> Link to this post: 
>>>>>>> http://www.open-mpi.org/community/lists/users/2014/12/25966.php
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing listus...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> Link to this 
>>>>>>> post:http://www.open-mpi.org/community/lists/users/2014/12/25967.php
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing listus...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>> Link to this post: 
>>>>>>> http://www.open-mpi.org/community/lists/users/2014/12/25968.php
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing listus...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> Link to this 
>>>>>>> post:http://www.open-mpi.org/community/lists/users/2014/12/25969.php
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing listus...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> Link to this post: 
>>>>>>> http://www.open-mpi.org/community/lists/users/2014/12/25970.php
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> Link to this post:
>>>>>>> http://www.open-mpi.org/community/lists/users/2014/12/25971.php
>>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/users/2014/12/25974.php
>>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/users/2014/12/25975.php
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/users/2014/12/25978.php
>>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/users/2014/12/25979.php
>>>
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2014/12/25981.php
>>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/12/25982.php
>

Reply via email to