Dears,
I try from several days to use advanced MPI2 features in the following
scenario :
1) a master code A (of size NPA) spawns (MPI_Comm_spawn()) two slave
codes B (of size NPB) and C (of size NPC), providing intercomms
A-B and A-C ;
2) i create intracomm AB and AC by merging intercomms ;
3) then i create intercomm AB-C by calling MPI_Intercomm_create() by
using AC as bridge...
MPI_Comm intercommABC; A: MPI_Intercomm_create(intracommAB, 0,
intracommAC, NPA, TAG,&intercommABC);
B: MPI_Intercomm_create(intracommAB, 0, MPI_COMM_NULL,
0,TAG,&intercommABC);
C: MPI_Intercomm_create(intracommC, 0, intracommAC, 0, TAG,&intercommABC);
In these calls, A0 and C0 play the role of local leader for AB
and C respectively.
C0 and A0 play the roles of remote leader in bridge intracomm AC.
3) MPI_Barrier(intercommABC);
4) i merge intercomm AB-C into intracomm ABC$
5) MPI_Barrier(intracommABC);
My BUG: These calls success, but when i try to use intracommABC for a
collective communication like MPI_Barrier(),
i got the following error :
*** An error occurred in MPI_Barrier
*** on communicator
*** MPI_ERR_INTERN: internal error
*** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
I try with OpenMPI trunk, 1.5.3, 1.5.4 and Mpich2-1.4.1p1
My code works perfectly if intracomm A, B and C are obtained by
MPI_Comm_split() instead of MPI_Comm_spawn() !!!!
I found same problem in a previous thread of the OMPI Users mailing list :
=> http://www.open-mpi.org/community/lists/users/2011/06/16711.php
Is that bug/problem is currently under investigation ? :-)
i can give detailed code, but the one provided by George Bosilca in this
previous thread provides same error...
Thank you to help me...
--
Aurélien Esnard
University Bordeaux 1 / LaBRI / INRIA (France)