Strange -- this almost implies a race condition somewhere. I don't see anything wrong with your application (other than it doesn't free the communicators, but that's not an error).
Edgar -- the intercomm code is yours. Could you have a look? On Jan 23, 2012, at 11:03 AM, jody wrote: > Hi > I've got a really strange problem: > > I've got an application which creates intercommunicators between a > master and some workers. > > When i run it on our cluster with 11 processes it works, > when i run it with 12 processes it hangs inside MPI_Intercomm_create(). > > This is the hostfile: > squid_0.uzh.ch slots=3 max-slots=3 > squid_1.uzh.ch slots=2 max-slots=2 > squid_2.uzh.ch slots=1 max-slots=1 > squid_3.uzh.ch slots=1 max-slots=1 > triops.uzh.ch slots=8 max-slots=8 > > Actually all squid_X have 4 cores, but i managed to reduce the number of > processes needed for failure by making the above settings. > > So with all available squid cores and 3 triops cores it works, > but with 4 triops cores it hangs. > > On the other hand, if i use all 16 squid cores (but no triops cores) > it works, too. > > If i start the application not from triopps, but froim another workstation, > i have a similar pattern of Intercomm_create failures. > > Note that with the above hostfile a simple HelloMPI works also with 14 > or more processes. > > The frustrating thing is that this exact same code has worked before! > > Does anybody have an explanation? > Thank You > > I managed to simplify the application: > > #include <stdio.h> > #include "mpi.h" > > int main(int iArgC, char *apArgV[]) { > int iResult = 0; > int iNumProcs = 0; > int iID = -1; > > MPI_Init(&iArgC, &apArgV); > > MPI_Comm_size(MPI_COMM_WORLD, &iNumProcs); > MPI_Comm_rank(MPI_COMM_WORLD, &iID); > > int iKey; > if (iID == 0) { > iKey = 0; > > } else { > iKey = 1; > } > > MPI_Comm commInter1; > MPI_Comm commInter2; > MPI_Comm commIntra; > > MPI_Comm_split(MPI_COMM_WORLD, iKey, iID, &commIntra); > > int iRankM; > MPI_Comm_rank(commIntra, &iRankM); > printf("Local rank: %d\n", iRankM); > > switch (iKey) { > case 0: > printf("Creating intercomm 1 for Master (%d)\n", iID); > MPI_Intercomm_create(commIntra, 0, MPI_COMM_WORLD, 1, 01, &commInter2); > break; > case 1: > printf("Creating intercomm 1 for FH (%d)\n", iID); > MPI_Intercomm_create(commIntra, 0, MPI_COMM_WORLD, 0, 01, &commInter1); > } > > printf("finalizing\n"); > MPI_Finalize(); > > printf("exiting with %d\n", iResult); > return iResult; > } > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/