Hi

Sorry, I was calling mpi_comm_disconnect on the group comm handler, not
on the intercomm handler returned from the spawn call as it should be.

Well, calling the disconnect on the intercomm handler does halt the spwaner
side but the wait is never completed since, as George points out, there is
no
disconnect call being made on the spawnee side.... and that brings me back
to the beginning of the problem since, being a third party app, that call
would
never be there. I guess an mpi wrapper to deal with that could be made for
the app, but I fell the wrapper itself, at the end, would face the same
problem
we face right now.

My application is a genetic algorithm code that search optimal configuration
(minimum or maximum energy) of cluster of atoms. The work flow bottleneck
is the calculation of the cluster energy. For the cases which an analytical
potential is available the calculation can be made internally and the
workload
is distributed among slaves nodes from a master node. This is also done
when an analytical potential is not available and the energy calculation
must
be done externally by a quantum chemistry code like dftb+, siesta and
Gaussian.
So far, we have been running these codes in serial mode. No need to say that
we could do a lot better if they could be executed in parallel.

I am not familiar with DMRAA but it seems to be the right choice to deal
with
job schedulers as it covers the ones I am interested in (pbs/torque and
loadlever).

Alex

2014-12-13 7:49 GMT-02:00 Gilles Gouaillardet <gilles.gouaillar...@gmail.com
>:
>
> George is right about the semantic
>
> However i am surprised it returns immediatly...
> That should either work or hang imho
>
> The second point is no more mpi related, and is batch manager specific.
>
> You will likely find a submit parameter to make the command block until
> the job completes. Or you can write your own wrapper.
> Or you can retrieve the jobid and qstat periodically to get the job state.
> If an api is available, this is also an option.
>
> Cheers,
>
> Gilles
>
> George Bosilca <bosi...@icl.utk.edu> wrote:
> You have to call MPI_Comm_disconnect on both sides of the
> intercommunicator. On the spawner processes you should call it on the
> intercom, while on the spawnees you should call it on the
> MPI_Comm_get_parent.
>
>   George.
>
> On Dec 12, 2014, at 20:43 , Alex A. Schmidt <a...@ufsm.br> wrote:
>
> Gilles,
>
> MPI_comm_disconnect seem to work but not quite.
> The call to it returns almost immediatly while
> the spawn processes keep piling up in the background
> until they are all done...
>
> I think system('env -i qsub...') to launch the third party apps
> would take the execution of every call back to the scheduler
> queue. How would I track each one for their completion?
>
> Alex
>
> 2014-12-12 22:35 GMT-02:00 Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com>:
>>
>> Alex,
>>
>> You need MPI_Comm_disconnect at least.
>> I am not sure if this is 100% correct nor working.
>>
>> If you are using third party apps, why dont you do something like
>> system("env -i qsub ...")
>> with the right options to make qsub blocking or you manually wait for the
>> end of the job ?
>>
>> That looks like a much cleaner and simpler approach to me.
>>
>> Cheers,
>>
>> Gilles
>>
>> "Alex A. Schmidt" <a...@ufsm.br> wrote:
>> Hello Gilles,
>>
>> Ok, I believe I have a simple toy app running as I think it should:
>> 'n' parent processes running under mpi_comm_world, each one
>> spawning its own 'm' child processes (each child group work
>> together nicely, returning the expected result for an mpi_allreduce
>> call).
>>
>> Now, as I mentioned before, the apps I want to run in the spawned
>> processes are third party mpi apps and I don't think it will be possible
>> to exchange messages with them from my app. So, I do I tell
>> when the spawned processes have finnished running? All I have to work
>> with is the intercommunicator returned from the mpi_comm_spawn call...
>>
>> Alex
>>
>>
>>
>>
>> 2014-12-12 2:42 GMT-02:00 Alex A. Schmidt <a...@ufsm.br>:
>>>
>>> Gilles,
>>>
>>> Well, yes, I guess....
>>>
>>> I'll do tests with the real third party apps and let you know.
>>> These are huge quantum chemistry codes (dftb+, siesta and Gaussian)
>>> which greatly benefits from a parallel environment. My code is just
>>> a front end to use those, but since we have a lot of data to process
>>> it also benefits from a parallel environment.
>>>
>>> Alex
>>>
>>>
>>> 2014-12-12 2:30 GMT-02:00 Gilles Gouaillardet <
>>> gilles.gouaillar...@iferc.org>:
>>>>
>>>>  Alex,
>>>>
>>>> just to make sure ...
>>>> this is the behavior you expected, right ?
>>>>
>>>> Cheers,
>>>>
>>>> Gilles
>>>>
>>>>
>>>> On 2014/12/12 13:27, Alex A. Schmidt wrote:
>>>>
>>>> Gilles,
>>>>
>>>> Ok, very nice!
>>>>
>>>> When I excute
>>>>
>>>> do rank=1,3
>>>>     call  MPI_Comm_spawn('hello_world','
>>>> ',5,MPI_INFO_NULL,rank,MPI_COMM_WORLD,my_intercomm,MPI_ERRCODES_IGNORE,status)
>>>> enddo
>>>>
>>>> I do get 15 instances of the 'hello_world' app running: 5 for each parent
>>>> rank 1, 2 and 3.
>>>>
>>>> Thanks a lot, Gilles.
>>>>
>>>> Best regargs,
>>>>
>>>> Alex
>>>>
>>>>
>>>>
>>>>
>>>> 2014-12-12 1:32 GMT-02:00 Gilles Gouaillardet 
>>>> <gilles.gouaillar...@iferc.org
>>>>
>>>>  :
>>>>
>>>>  Alex,
>>>>
>>>> just ask MPI_Comm_spawn to start (up to) 5 tasks via the maxprocs
>>>> parameter :
>>>>
>>>>        int MPI_Comm_spawn(char *command, char *argv[], int maxprocs,
>>>> MPI_Info info,
>>>>                          int root, MPI_Comm comm, MPI_Comm *intercomm,
>>>>                          int array_of_errcodes[])
>>>>
>>>> INPUT PARAMETERS
>>>>        maxprocs
>>>>               - maximum number of processes to start (integer, significant
>>>> only at root)
>>>>
>>>> Cheers,
>>>>
>>>> Gilles
>>>>
>>>>
>>>> On 2014/12/12 12:23, Alex A. Schmidt wrote:
>>>>
>>>> Hello Gilles,
>>>>
>>>> Thanks for your reply. The "env -i PATH=..." stuff seems to work!!!
>>>>
>>>> call system("sh -c 'env -i PATH=/usr/lib64/openmpi/bin:/bin mpirun -n 2
>>>> hello_world' ")
>>>>
>>>> did produce the expected result with a simple openmi "hello_world" code I
>>>> wrote.
>>>>
>>>> I might be harder though with the real third party app I have in mind. And
>>>> I realize
>>>> getting passed over a job scheduler with this approach might not work at
>>>> all...
>>>>
>>>> I have looked at the MPI_Comm_spawn call but I failed to understand how it
>>>> could help here. For instance, can I use it to launch an mpi app with the
>>>> option "-n 5" ?
>>>>
>>>> Alex
>>>>
>>>> 2014-12-12 0:36 GMT-02:00 Gilles Gouaillardet 
>>>> <gilles.gouaillar...@iferc.org
>>>>
>>>>
>>>>  :
>>>>
>>>>  Alex,
>>>>
>>>> can you try something like
>>>> call system(sh -c 'env -i /.../mpirun -np 2 /.../app_name')
>>>>
>>>> -i start with an empty environment
>>>> that being said, you might need to set a few environment variables
>>>> manually :
>>>> env -i PATH=/bin ...
>>>>
>>>> and that being also said, this "trick" could be just a bad idea :
>>>> you might be using a scheduler, and if you empty the environment, the
>>>> scheduler
>>>> will not be aware of the "inside" run.
>>>>
>>>> on top of that, invoking system might fail depending on the interconnect
>>>> you use.
>>>>
>>>> Bottom line, i believe Ralph's reply is still valid, even if five years
>>>> have passed :
>>>> changing your workflow, or using MPI_Comm_spawn is a much better approach.
>>>>
>>>> Cheers,
>>>>
>>>> Gilles
>>>>
>>>> On 2014/12/12 11:22, Alex A. Schmidt wrote:
>>>>
>>>> Dear OpenMPI users,
>>>>
>>>> Regarding to this previous 
>>>> post<http://www.open-mpi.org/community/lists/users/2009/06/9560.php> 
>>>> <http://www.open-mpi.org/community/lists/users/2009/06/9560.php> 
>>>> <http://www.open-mpi.org/community/lists/users/2009/06/9560.php> 
>>>> <http://www.open-mpi.org/community/lists/users/2009/06/9560.php> 
>>>> <http://www.open-mpi.org/community/lists/users/2009/06/9560.php> 
>>>> <http://www.open-mpi.org/community/lists/users/2009/06/9560.php> 
>>>> <http://www.open-mpi.org/community/lists/users/2009/06/9560.php> 
>>>> <http://www.open-mpi.org/community/lists/users/2009/06/9560.php> from 2009,
>>>> I wonder if the reply
>>>> from Ralph Castain is still valid. My need is similar but quite simpler:
>>>> to make a system call from an openmpi fortran application to run a
>>>> third party openmpi application. I don't need to exchange mpi messages
>>>> with the application. I just need to read the resulting output file
>>>> generated
>>>> by it. I have tried to do the following system call from my fortran openmpi
>>>> code:
>>>>
>>>> call system("sh -c 'mpirun -n 2 app_name")
>>>>
>>>> but I get
>>>>
>>>> **********************************************************
>>>>
>>>> Open MPI does not support recursive calls of mpirun
>>>>
>>>> **********************************************************
>>>>
>>>> Is there a way to make this work?
>>>>
>>>> Best regards,
>>>>
>>>> Alex
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing listus...@open-mpi.org
>>>>
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/users/2014/12/25966.php
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing listus...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this 
>>>> post:http://www.open-mpi.org/community/lists/users/2014/12/25967.php
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing listus...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/users/2014/12/25968.php
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing listus...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this 
>>>> post:http://www.open-mpi.org/community/lists/users/2014/12/25969.php
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing listus...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/users/2014/12/25970.php
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/users/2014/12/25971.php
>>>>
>>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2014/12/25974.php
>>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/12/25975.php
>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/12/25978.php
>

Reply via email to