Dears,

I try from several days to use advanced MPI2 features in the following scenario :

  1) a master code A (of size NPA) spawns (MPI_Comm_spawn()) two slave
codes B (of size NPB) and C (of size NPC), providing intercomms A-B and A-C ;
  2) i create intracomm AB and AC by merging intercomms ;
3) then i create intercomm AB-C by calling MPI_Intercomm_create() by using AC as bridge...

MPI_Comm intercommABC; A: MPI_Intercomm_create(intracommAB, 0, intracommAC, NPA, TAG,&intercommABC); B: MPI_Intercomm_create(intracommAB, 0, MPI_COMM_NULL, 0,TAG,&intercommABC);
C: MPI_Intercomm_create(intracommC, 0, intracommAC, 0, TAG,&intercommABC);

In these calls, A0 and C0 play the role of local leader for AB and C respectively.
      C0 and A0 play the roles of remote leader in bridge intracomm AC.

  3)  MPI_Barrier(intercommABC);
  4)  i merge intercomm AB-C into intracomm ABC$
  5)  MPI_Barrier(intracommABC);

My BUG: These calls success, but when i try to use intracommABC for a collective communication like MPI_Barrier(),
               i got the following error :

*** An error occurred in MPI_Barrier
*** on communicator
*** MPI_ERR_INTERN: internal error
*** MPI_ERRORS_ARE_FATAL: your MPI job will now abort


I try with OpenMPI trunk, 1.5.3, 1.5.4 and Mpich2-1.4.1p1

My code works perfectly if intracomm A, B and C are obtained by MPI_Comm_split() instead of MPI_Comm_spawn() !!!!


I found same problem in a previous thread of the OMPI Users mailing list :

  => http://www.open-mpi.org/community/lists/users/2011/06/16711.php

Is that bug/problem is currently under investigation ? :-)

i can give detailed code, but the one provided by George Bosilca in this previous thread provides same error...

Thank you to help me...

--
Aurélien Esnard
University Bordeaux 1 / LaBRI / INRIA (France)

Reply via email to