George -- can you file a ticket about this?

On Jun 12, 2011, at 1:25 PM, George Bosilca wrote:

> Fraderic,
> 
> Based on the current version of the MPI standard, the two groups involved in 
> the intercomm_create have to be disjoints, which means the leader cannot be 
> the same process.
> 
> Regarding the issue in Open MPI, the problem is deep in our modex exchange 
> (contact information). In the example I sent around a while back, the 
> intercomm_create is working, but the resulting communicator contains 
> processes without this modex information. This lead to an error on the next 
> collective communication.
> 
>  george.
> 
> On Jun 12, 2011, at 03:44 , Frédéric Feyel wrote:
> 
>> Dear all, thank you very much for the time spent at looking at my problem.
>> 
>> After reading your contributions, it's not clear wether there is a bug in
>> OpenMPI or not.
>> 
>> So I created a small self contained source code to analyse the behavior,
>> and the problem is still there.
>> 
>> I was wondering if the local and remote leader in the 2 groups could be
>> the same process. Unfortunately, I get
>> an error in the two cases (local and remote leader identical or not).
>> 
>> What do you think about my small source code ?
>> 
>> Best regards,
>> 
>> Frédéric.
>> 
>> 
>> On Tue, 07 Jun 2011 10:31:51 -0500, Edgar Gabriel <gabr...@cs.uh.edu>
>> wrote:
>>> On 6/7/2011 10:23 AM, George Bosilca wrote:
>>>> 
>>>> On Jun 7, 2011, at 11:00 , Edgar Gabriel wrote:
>>>> 
>>>>> George,
>>>>> 
>>>>> I did not look over all the details of your test, but it looks to
>>>>> me like you are violating one of the requirements of
>>>>> intercomm_create namely the request that the two groups have to be
>>>>> disjoint. In your case the parent process(es) are part of both
>>>>> local intra-communicators, isn't it?
>>>> 
>>>> The two groups of the two local communicators are disjoints. One
>>>> contains A,B while the other only C. The bridge communicator contains
>>>> A,C.
>>>> 
>>>> I'm confident my example is supposed to work. At least for Open MPI
>>>> the error is under the hood, as the resulting inter-communicator is
>>>> valid but contains NULL endpoints for the remote process.
>>> 
>>> I'll come back to that later, I am not yet convinced that your code is
>>> correct :-) Your local groups might be disjoint, but I am worried about
>>> the ranks of the remote leader in your example. THey can not be 0 from
>>> both groups perspective.
>>> 
>>>> 
>>>> Regarding the fact that the two leader should be separate processes,
>>>> you will not find any wording about this in the current version of
>>>> the standard. In the 1.1 there were two opposite sentences about this
>>>> one stating that the two groups can be disjoint, while the other
>>>> claiming that the two leaders can be the same process. After
>>>> discussion, the agreement was that the two groups have to be
>>>> disjoint, and the standard has been amended to match the agreement.
>>> 
>>> 
>>> I realized that this is a non-issue. If the two local groups are
>>> disjoint, there is no way that the two local leaders are the same
>> process.
>>> 
>>> Thanks
>>> Edgar
>>> 
>>>> 
>>>> george.
>>>> 
>>>> 
>>>>> 
>>>>> I just have MPI-1.1. at hand right now, but here is what it says: 
>>>>> ----
>>>>> 
>>>>> Overlap of local and remote groups that are bound into an 
>>>>> inter-communicator is prohibited. If there is overlap, then the
>>>>> program is erroneous and is likely to deadlock.
>>>>> 
>>>>> ---- so bottom line is that the two local intra-communicators that
>>>>> are being used have to be disjoint, and the bridgecomm needs to be
>>>>> a communicator where at least one process of each of the two
>>>>> disjoint groups need to be able to talk to each other.
>>>>> Interestingly I did not find a sentence whether it is allowed to be
>>>>> the same process, or whether the two local leaders need to be
>>>>> separate processes...
>>>>> 
>>>>> 
>>>>> Thanks Edgar
>>>>> 
>>>>> 
>>>>> On 6/7/2011 12:57 AM, George Bosilca wrote:
>>>>>> Frederic,
>>>>>> 
>>>>>> Attached you will find an example that is supposed to work. The
>>>>>> main difference with your code is on T3, T4 where you have
>>>>>> inversed the local and remote comm. As depicted on the picture
>>>>>> attached below, during the 3th step you will create the intercomm
>>>>>> between ab and c (no overlap) using ac as a bridge communicator
>>>>>> (here the two roots, a and c, can exchange messages).
>>>>>> 
>>>>>> Based on the MPI 2.2 standard, especially on the paragraph in
>>>>>> PS:, the attached code should have been working. Unfortunately, I
>>>>>> couldn't run it successfully neither with Open MPI trunk nor
>>>>>> MPICH2 1.4rc1.
>>>>>> 
>>>>>> george.
>>>>>> 
>>>>>> PS: Here is what the MPI standard states about the
>>>>>> MPI_Intercomm_create:
>>>>>>> The function MPI_INTERCOMM_CREATE can be used to create an
>>>>>>> inter-communicator from two existing intra-communicators, in
>>>>>>> the following situation: At least one selected member from each
>>>>>>> group (the “group leader”) has the ability to communicate with
>>>>>>> the selected member from the other group; that is, a “peer”
>>>>>>> communicator exists to which both leaders belong, and each
>>>>>>> leader knows the rank of the other leader in this peer
>>>>>>> communicator. Furthermore, members of each group know the rank
>>>>>>> of their leader.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Jun 1, 2011, at 05:00 , Frédéric Feyel wrote:
>>>>>> 
>>>>>>> Hello,
>>>>>>> 
>>>>>>> I have a problem using MPI_Intercomm_create.
>>>>>>> 
>>>>>>> I 5 tasks, let's say T0, T1, T2, T3, T4 resulting from two
>>>>>>> spawn operations by T0.
>>>>>>> 
>>>>>>> So I have two intra-communicator :
>>>>>>> 
>>>>>>> intra0 contains : T0, T1, T2 intra1 contains : T0, T3, T4
>>>>>>> 
>>>>>>> my goal is to make a collective loop to build a single
>>>>>>> intra-communicator containing T0, T1, T2, T3, T4
>>>>>>> 
>>>>>>> I tried to do it using MPI_Intercomm_create and
>>>>>>> MPI_Intercom_merge calls, but without success (I always get MPI
>>>>>>> internal errors).
>>>>>>> 
>>>>>>> What I am doing :
>>>>>>> 
>>>>>>> on T0 : *******
>>>>>>> 
>>>>>>> MPI_Intercom_create(intra0,0,intra1,0,1,&new_com)
>>>>>>> 
>>>>>>> on T1 and T2 : **************
>>>>>>> 
>>>>>>> MPI_Intercom_create(intra0,0,MPI_COMM_WORLD,0,1,&new_com)
>>>>>>> 
>>>>>>> on T3 and T4 : **************
>>>>>>> 
>>>>>>> MPI_Intercom_create(intra1,0,MPI_COMM_WORLD,0,1,&new_com)
>>>>>>> 
>>>>>>> 
>>>>>>> I'm certainly missing something. Could anybody help me to solve
>>>>>>> this problem ?
>>>>>>> 
>>>>>>> Best regards,
>>>>>>> 
>>>>>>> Frédéric.
>>>>>>> 
>>>>>>> PS : of course I did an extensive web search without finding
>>>>>>> anything usefull on my problem.
>>>>>>> 
>>>>>>> _______________________________________________ users mailing
>>>>>>> list us...@open-mpi.org 
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________ users mailing
>>>>>> list us...@open-mpi.org 
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> 
>>>>> -- Edgar Gabriel Assistant Professor Parallel Software Technologies
>>>>> Lab      http://pstl.cs.uh.edu Department of Computer Science
>>>>> University of Houston Philip G. Hoffman Hall, Room 524
>>>>> Houston, TX-77204, USA Tel: +1 (713) 743-3857                  Fax:
>>>>> +1 (713) 743-3335
>>>>> 
>>>>> _______________________________________________ users mailing list 
>>>>> us...@open-mpi.org 
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> 
>>>> 
>>>> _______________________________________________ users mailing list 
>>>> us...@open-mpi.org 
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> <spawn-example.c>_______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to