FWIW, I filed https://svn.open-mpi.org/trac/ompi/ticket/2241 about this.

Thanks Jed!


On Feb 6, 2010, at 10:56 AM, Jed Brown wrote:

> On Fri, 5 Feb 2010 14:28:40 -0600, Barry Smith <bsm...@mcs.anl.gov> wrote:
> > To cheer you up, when I run with openMPI it runs forever sucking down 
> > 100% CPU trying to send the messages :-)
> 
> On my test box (x86 with 8GB memory), Open MPI (1.4.1) does complete
> after several seconds, but still prints the wrong count.
> 
> MPICH2 does not actually send the message, as you can see by running the
> attached code.
> 
>   # Open MPI 1.4.1, correct cols[0]
>   [0] sending...
>   [1] receiving...
>   count -103432106, cols[0] 0
> 
>   # MPICH2 1.2.1, incorrect cols[1]
>   [1] receiving...
>   [0] sending...
>   [1] count -103432106, cols[0] 1
> 
> 
> How much memory does crush have (you need about 7GB to do this without
> swapping)?  In particular, most of the time it took Open MPI to send the
> message (with your source) was actually just spent faulting the
> send/recv buffers.  The attached faults the buffers first, and the
> subsequent send/recv takes less than 2 seconds.
> 
> Actually, it's clear that MPICH2 never touches either buffer because it
> returns immediately regardless of whether they have been faulted first.
> 
> Jed
> 
> 
> #include <mpi.h>
> #include <stdio.h>
> #include <stdlib.h>
> 
> int main(int argc,char **argv)
> {
>  int        ierr,i,size,rank;
>  int        cnt = 433438806;
>  MPI_Status status;
>  long long  *cols;
> 
>  MPI_Init(&argc,&argv);
>  ierr = MPI_Comm_size(MPI_COMM_WORLD,&size);
>  ierr = MPI_Comm_rank(MPI_COMM_WORLD,&rank);
>  if (size != 2) {
>    fprintf(stderr,"[%d] usage: mpiexec -n 2 %s\n",rank,argv[0]);
>    MPI_Abort(MPI_COMM_WORLD,1);
>  }
> 
>  cols = malloc(cnt*sizeof(long long));
>  for (i=0; i<cnt; i++) cols[i] = rank;
>  if (rank == 0) {
>    printf("[%d] sending...\n",rank);
>    ierr = MPI_Send(cols,cnt,MPI_LONG_LONG_INT,1,0,MPI_COMM_WORLD);
>  } else {
>    printf("[%d] receiving...\n",rank);
>    ierr = MPI_Recv(cols,cnt,MPI_LONG_LONG_INT,0,0,MPI_COMM_WORLD,&status);
>    ierr = MPI_Get_count(&status,MPI_LONG_LONG_INT,&cnt);
>    printf("[%d] count %d, cols[0] %lld\n",rank,cnt,cols[0]);
>  }
>  ierr = MPI_Finalize();
>  return 0;
> }
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to