FWIW, I filed https://svn.open-mpi.org/trac/ompi/ticket/2241 about this.
Thanks Jed! On Feb 6, 2010, at 10:56 AM, Jed Brown wrote: > On Fri, 5 Feb 2010 14:28:40 -0600, Barry Smith <bsm...@mcs.anl.gov> wrote: > > To cheer you up, when I run with openMPI it runs forever sucking down > > 100% CPU trying to send the messages :-) > > On my test box (x86 with 8GB memory), Open MPI (1.4.1) does complete > after several seconds, but still prints the wrong count. > > MPICH2 does not actually send the message, as you can see by running the > attached code. > > # Open MPI 1.4.1, correct cols[0] > [0] sending... > [1] receiving... > count -103432106, cols[0] 0 > > # MPICH2 1.2.1, incorrect cols[1] > [1] receiving... > [0] sending... > [1] count -103432106, cols[0] 1 > > > How much memory does crush have (you need about 7GB to do this without > swapping)? In particular, most of the time it took Open MPI to send the > message (with your source) was actually just spent faulting the > send/recv buffers. The attached faults the buffers first, and the > subsequent send/recv takes less than 2 seconds. > > Actually, it's clear that MPICH2 never touches either buffer because it > returns immediately regardless of whether they have been faulted first. > > Jed > > > #include <mpi.h> > #include <stdio.h> > #include <stdlib.h> > > int main(int argc,char **argv) > { > int ierr,i,size,rank; > int cnt = 433438806; > MPI_Status status; > long long *cols; > > MPI_Init(&argc,&argv); > ierr = MPI_Comm_size(MPI_COMM_WORLD,&size); > ierr = MPI_Comm_rank(MPI_COMM_WORLD,&rank); > if (size != 2) { > fprintf(stderr,"[%d] usage: mpiexec -n 2 %s\n",rank,argv[0]); > MPI_Abort(MPI_COMM_WORLD,1); > } > > cols = malloc(cnt*sizeof(long long)); > for (i=0; i<cnt; i++) cols[i] = rank; > if (rank == 0) { > printf("[%d] sending...\n",rank); > ierr = MPI_Send(cols,cnt,MPI_LONG_LONG_INT,1,0,MPI_COMM_WORLD); > } else { > printf("[%d] receiving...\n",rank); > ierr = MPI_Recv(cols,cnt,MPI_LONG_LONG_INT,0,0,MPI_COMM_WORLD,&status); > ierr = MPI_Get_count(&status,MPI_LONG_LONG_INT,&cnt); > printf("[%d] count %d, cols[0] %lld\n",rank,cnt,cols[0]); > } > ierr = MPI_Finalize(); > return 0; > } > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/