George, I did not look over all the details of your test, but it looks to me like you are violating one of the requirements of intercomm_create namely the request that the two groups have to be disjoint. In your case the parent process(es) are part of both local intra-communicators, isn't it?
I just have MPI-1.1. at hand right now, but here is what it says: ---- Overlap of local and remote groups that are bound into an inter-communicator is prohibited. If there is overlap, then the program is erroneous and is likely to deadlock. ---- so bottom line is that the two local intra-communicators that are being used have to be disjoint, and the bridgecomm needs to be a communicator where at least one process of each of the two disjoint groups need to be able to talk to each other. Interestingly I did not find a sentence whether it is allowed to be the same process, or whether the two local leaders need to be separate processes... Thanks Edgar On 6/7/2011 12:57 AM, George Bosilca wrote: > Frederic, > > Attached you will find an example that is supposed to work. The main > difference with your code is on T3, T4 where you have inversed the local and > remote comm. As depicted on the picture attached below, during the 3th step > you will create the intercomm between ab and c (no overlap) using ac as a > bridge communicator (here the two roots, a and c, can exchange messages). > > Based on the MPI 2.2 standard, especially on the paragraph in PS:, the > attached code should have been working. Unfortunately, I couldn't run it > successfully neither with Open MPI trunk nor MPICH2 1.4rc1. > > george. > > PS: Here is what the MPI standard states about the MPI_Intercomm_create: >> The function MPI_INTERCOMM_CREATE can be used to create an >> inter-communicator from two existing intra-communicators, in the following >> situation: At least one selected member from each group (the “group leader”) >> has the ability to communicate with the selected member from the other >> group; that is, a “peer” communicator exists to which both leaders belong, >> and each leader knows the rank of the other leader in this peer >> communicator. Furthermore, members of each group know the rank of their >> leader. > > > > > > > > > > On Jun 1, 2011, at 05:00 , Frédéric Feyel wrote: > >> Hello, >> >> I have a problem using MPI_Intercomm_create. >> >> I 5 tasks, let's say T0, T1, T2, T3, T4 resulting from two spawn >> operations by T0. >> >> So I have two intra-communicator : >> >> intra0 contains : T0, T1, T2 >> intra1 contains : T0, T3, T4 >> >> my goal is to make a collective loop to build a single intra-communicator >> containing T0, T1, T2, T3, T4 >> >> I tried to do it using MPI_Intercomm_create and MPI_Intercom_merge calls, >> but without success (I always get MPI internal errors). >> >> What I am doing : >> >> on T0 : >> ******* >> >> MPI_Intercom_create(intra0,0,intra1,0,1,&new_com) >> >> on T1 and T2 : >> ************** >> >> MPI_Intercom_create(intra0,0,MPI_COMM_WORLD,0,1,&new_com) >> >> on T3 and T4 : >> ************** >> >> MPI_Intercom_create(intra1,0,MPI_COMM_WORLD,0,1,&new_com) >> >> >> I'm certainly missing something. Could anybody help me to solve this >> problem ? >> >> Best regards, >> >> Frédéric. >> >> PS : of course I did an extensive web search without finding anything >> usefull on my problem. >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Edgar Gabriel Assistant Professor Parallel Software Technologies Lab http://pstl.cs.uh.edu Department of Computer Science University of Houston Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
signature.asc
Description: OpenPGP digital signature