You have to call MPI_Comm_disconnect on both sides of the intercommunicator. On
the spawner processes you should call it on the intercom, while on the spawnees
you should call it on the MPI_Comm_get_parent.
George.
> On Dec 12, 2014, at 20:43 , Alex A. Schmidt wrote:
>
> Gilles,
>
> MPI_co
Hi,
Am 13.12.2014 um 02:43 schrieb Alex A. Schmidt:
> MPI_comm_disconnect seem to work but not quite.
> The call to it returns almost immediatly while
> the spawn processes keep piling up in the background
> until they are all done...
>
> I think system('env -i qsub...') to launch the third part
George is right about the semantic
However i am surprised it returns immediatly...
That should either work or hang imho
The second point is no more mpi related, and is batch manager specific.
You will likely find a submit parameter to make the command block until the job
completes. Or you can w
Hi
Sorry, I was calling mpi_comm_disconnect on the group comm handler, not
on the intercomm handler returned from the spawn call as it should be.
Well, calling the disconnect on the intercomm handler does halt the spwaner
side but the wait is never completed since, as George points out, there is
MPI_Comm_disconnect should be a local operation, there is no reason for it
to deadlock. I looked at the code and everything is local with the
exception of a call to PMIX.FENCE. Can you attach to your deadlocked
processes and confirm that they are stopped in the pmix.fence?
George.
On Sat, Dec
Alex,
Are you calling MPI_Comm_disconnect in the 3 "master" tasks and with the same
remote communicator ?
I also read the man page again, and MPI_Comm_disconnect does not ensure the
remote processes have finished or called MPI_Comm_disconnect, so that might not
be the thing you need.
George, c