What version did you upgrade to? (we don't control the Ubuntu packaging) I see a bullet in the soon-to-be-released 1.4.5 release notes:
- Fix obscure cases where MPI_ALLGATHER could crash. Thanks to Andrew Senin for reporting the problem. But that would be surprising if this is what fixed your issue, especially since it's not released yet. :-) On Jan 26, 2012, at 5:24 AM, Brett Tully wrote: > As of two days ago, this problem has disappeared and the tests that I had > written and run each night are now passing. Having looked through the update > log of my machine (Ubuntu 11.10) it appears as though I got a new version of > mpi-default-dev (0.6ubuntu1). I would like to understand this problem in more > detail -- is it possible to see what changed in this update? > Thanks, > Brett. > > > > On Fri, Dec 9, 2011 at 6:43 PM, teng ma <t...@eecs.utk.edu> wrote: > I guess your output is from different ranks. YOu can add rank infor inside > print to tell like follows: > > (void) printf("rank %d: gathered[%d].node = %d\n", rank, i, gathered[i].node); > > From my side, I did not see anything wrong from your code in Open MPI 1.4.3. > after I add rank, the output is > rank 5: gathered[0].node = 0 > rank 5: gathered[1].node = 1 > rank 5: gathered[2].node = 2 > rank 5: gathered[3].node = 3 > rank 5: gathered[4].node = 4 > rank 5: gathered[5].node = 5 > rank 3: gathered[0].node = 0 > rank 3: gathered[1].node = 1 > rank 3: gathered[2].node = 2 > rank 3: gathered[3].node = 3 > rank 3: gathered[4].node = 4 > rank 3: gathered[5].node = 5 > rank 1: gathered[0].node = 0 > rank 1: gathered[1].node = 1 > rank 1: gathered[2].node = 2 > rank 1: gathered[3].node = 3 > rank 1: gathered[4].node = 4 > rank 1: gathered[5].node = 5 > rank 0: gathered[0].node = 0 > rank 0: gathered[1].node = 1 > rank 0: gathered[2].node = 2 > rank 0: gathered[3].node = 3 > rank 0: gathered[4].node = 4 > rank 0: gathered[5].node = 5 > rank 4: gathered[0].node = 0 > rank 4: gathered[1].node = 1 > rank 4: gathered[2].node = 2 > rank 4: gathered[3].node = 3 > rank 4: gathered[4].node = 4 > rank 4: gathered[5].node = 5 > rank 2: gathered[0].node = 0 > rank 2: gathered[1].node = 1 > rank 2: gathered[2].node = 2 > rank 2: gathered[3].node = 3 > rank 2: gathered[4].node = 4 > rank 2: gathered[5].node = 5 > > Is that what you expected? > > On Fri, Dec 9, 2011 at 12:03 PM, Brett Tully <brett.tu...@oxyntix.com> wrote: > Dear all, > > I have not used OpenMPI much before, but am maintaining a large legacy > application. We noticed a bug to do with a call to MPI_Allgather as > summarised in this post to Stackoverflow: > http://stackoverflow.com/questions/8445398/mpi-allgather-produces-inconsistent-results > > In the process of looking further into the problem, I noticed that the > following function results in strange behaviour. > > void test_all_gather() { > > struct _TEST_ALL_GATHER { > int node; > }; > > int ierr, size, rank; > ierr = MPI_Comm_size(MPI_COMM_WORLD, &size); > ierr = MPI_Comm_rank(MPI_COMM_WORLD, &rank); > > struct _TEST_ALL_GATHER local; > struct _TEST_ALL_GATHER *gathered; > > gathered = (struct _TEST_ALL_GATHER*) malloc(size * sizeof(*gathered)); > > local.node = rank; > > MPI_Allgather(&local, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE, > gathered, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE, MPI_COMM_WORLD); > > int i; > for (i = 0; i < numnodes; ++i) { > (void) printf("gathered[%d].node = %d\n", i, gathered[i].node); > } > > FREE(gathered); > } > > At one point, this function printed the following: > gathered[0].node = 2 > gathered[1].node = 3 > gathered[2].node = 2 > gathered[3].node = 3 > gathered[4].node = 4 > gathered[5].node = 5 > > Can anyone suggest a place to start looking into why this might be happening? > There is a section of the code that calls MPI_Comm_split, but I am not sure > if that is related... > > Running on Ubuntu 11.10 and a summary of ompi_info: > Package: Open MPI buildd@allspice Distribution > Open MPI: 1.4.3 > Open MPI SVN revision: r23834 > Open MPI release date: Oct 05, 2010 > Open RTE: 1.4.3 > Open RTE SVN revision: r23834 > Open RTE release date: Oct 05, 2010 > OPAL: 1.4.3 > OPAL SVN revision: r23834 > OPAL release date: Oct 05, 2010 > Ident string: 1.4.3 > Prefix: /usr > Configured architecture: x86_64-pc-linux-gnu > Configure host: allspice > Configured by: buildd > > Thanks! > Brett > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > -- > | Teng Ma Univ. of Tennessee | > | t...@cs.utk.edu Knoxville, TN | > | http://web.eecs.utk.edu/~tma/ | > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/