do you have by any chance the actual or a small reproducer? It might be much easier to hunt the problem down...
Thanks Edgar On 3/19/2012 8:12 PM, Rodrigo Oliveira wrote: > Hi there. > > I am facing a very strange problem when using MPI_Barrier over an > inter-communicator after some operations I describe bellow: > > 1) I start a server calling mpirun. > 2) The server spawns 2 copies of a client using MPI_Comm_spawn, creating > an inter-communicator between the two groups. The server group with 1 > process (lets name it as A) and the client group with 2 processes (group B). > 3) After that, I need to detach one of the processes (rank 0) in group B > from the inter-communicator AB. To do that I do the following steps: > > Server side: > ..... > tmp_inter_comm = client_comm.Create ( client_comm.Get_group ( ) ); > client_comm.Free ( ); > client_comm = tmp_inter_comm; > ..... > client_comm.Barrier(); > ..... > > Client side: > .... > rank = 0; > tmp_inter_comm = server_comm.Create ( server_comm.Get_group ( > ).Excl ( 1, &rank ) ); > server_comm.Free ( ); > server_comm = tmp_inter_comm; > ..... > if (server_comm != MPI::COMM_NULL) > server_comm.Barrier(); > > > The problem: everything works fine until the call to Barrier. In that > point, the server exits the barrier, but the client at the group B does > not. Observe that we have only one process inside B, because I used Excl > to remove one process from the original group. > > p.s.: This occurs in the version 1.5.4 and the C++ API. > > I am very concerned about this problem because this solution plays a > very important role in my master thesis. > > Is this an ompi problem or am I doing something wrong? > > Thanks in advance > > Rodrigo Oliveira > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
signature.asc
Description: OpenPGP digital signature