did you try to start the program with the --mca coll ^inter switch that
I mentioned? Collective dup for intercommunicators should work, its
probably again the bcast over a communicator of size 1 that is causing
the hang, and you could avoid it with the flag that I mentioned above.

Also, if you could attach your test code, that would help in hunting
things down.

Thanks
Edgar

On 4/4/2012 2:18 PM, Thatyene Louise Alves de Souza Ramos wrote:
> Hi there.
> 
> I've made some tests related to the problem reported by Rodrigo. And I
> think, I'd rather be wrong, that /collective calls like Create and Dup
> do not work with Inter communicators. I've try this in the client group:/
> 
> *MPI::Intercomm tmp_inter_comm;*
> *
> *
> *tmp_inter_comm = server_comm.Create (server_comm.Get_group().Excl(1,
> &rank));*
> *
> *
> *if(server_comm.Get_rank() != rank)*
> *server_comm = tmp_inter_comm.Dup();*
> *else*
> *server_comm = MPI::COMM_NULL;*
> *
> *
> The server_comm is the original inter communicator with the server group.
> 
> I've noticed that the program hangs in the Dup call. It seems that the
> tmp_inter_comm created without one process still has this process,
> because the other processes are waiting for it call the Dup too.
> 
> What do you think?
> 
> On Wed, Mar 28, 2012 at 6:03 PM, Edgar Gabriel <gabr...@cs.uh.edu
> <mailto:gabr...@cs.uh.edu>> wrote:
> 
>     it just uses a different algorithm which avoids the bcast on a
>     communicator of 1 (which is causing the problem here).
> 
>     Thanks
>     Edgar
> 
>     On 3/28/2012 12:08 PM, Rodrigo Oliveira wrote:
>     > Hi Edgar,
>     >
>     > I tested the execution of my code using the option -mca coll ^inter as
>     > you suggested and the program worked fine, even when I use 1 server
>     > instance.
>     >
>     > What is the modification caused by this parameter? I did not find an
>     > explanation about the utilization of the module coll inter.
>     >
>     > Thanks a lot for your attention and for the solution.
>     >
>     > Best regards,
>     >
>     > Rodrigo Oliveira
>     >
>     > On Tue, Mar 27, 2012 at 1:10 PM, Rodrigo Oliveira
>     > <rsilva.olive...@gmail.com <mailto:rsilva.olive...@gmail.com>
>     <mailto:rsilva.olive...@gmail.com
>     <mailto:rsilva.olive...@gmail.com>>> wrote:
>     >
>     >
>     >     Hi Edgar.
>     >
>     >     Thanks for the response. I just did not understand why the Barrier
>     >     works before I remove one of the client processes.
>     >
>     >     I tryed it with 1 server and 3 clients and it worked properly.
>     After
>     >     I removed 1 of the clients, it stops working. So, the removal is
>     >     affecting the functionality of Barrier, I guess.
>     >
>     >     Anyone has an idea?
>     >
>     >
>     >     On Mon, Mar 26, 2012 at 12:34 PM, Edgar Gabriel
>     <gabr...@cs.uh.edu <mailto:gabr...@cs.uh.edu>
>     >     <mailto:gabr...@cs.uh.edu <mailto:gabr...@cs.uh.edu>>> wrote:
>     >
>     >         I do not recall on what the agreement was on how to treat
>     the size=1
>     >
>     >
>     >
>     >
>     >
>     > _______________________________________________
>     > users mailing list
>     > us...@open-mpi.org <mailto:us...@open-mpi.org>
>     > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
>     _______________________________________________
>     users mailing list
>     us...@open-mpi.org <mailto:us...@open-mpi.org>
>     http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Edgar Gabriel
Associate Professor
Parallel Software Technologies Lab      http://pstl.cs.uh.edu
Department of Computer Science          University of Houston
Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to