On Fri, 5 Feb 2010 14:28:40 -0600, Barry Smith <bsm...@mcs.anl.gov> wrote: > To cheer you up, when I run with openMPI it runs forever sucking down > 100% CPU trying to send the messages :-)
On my test box (x86 with 8GB memory), Open MPI (1.4.1) does complete after several seconds, but still prints the wrong count. MPICH2 does not actually send the message, as you can see by running the attached code. # Open MPI 1.4.1, correct cols[0] [0] sending... [1] receiving... count -103432106, cols[0] 0 # MPICH2 1.2.1, incorrect cols[1] [1] receiving... [0] sending... [1] count -103432106, cols[0] 1 How much memory does crush have (you need about 7GB to do this without swapping)? In particular, most of the time it took Open MPI to send the message (with your source) was actually just spent faulting the send/recv buffers. The attached faults the buffers first, and the subsequent send/recv takes less than 2 seconds. Actually, it's clear that MPICH2 never touches either buffer because it returns immediately regardless of whether they have been faulted first. Jed
#include <mpi.h> #include <stdio.h> #include <stdlib.h> int main(int argc,char **argv) { int ierr,i,size,rank; int cnt = 433438806; MPI_Status status; long long *cols; MPI_Init(&argc,&argv); ierr = MPI_Comm_size(MPI_COMM_WORLD,&size); ierr = MPI_Comm_rank(MPI_COMM_WORLD,&rank); if (size != 2) { fprintf(stderr,"[%d] usage: mpiexec -n 2 %s\n",rank,argv[0]); MPI_Abort(MPI_COMM_WORLD,1); } cols = malloc(cnt*sizeof(long long)); for (i=0; i<cnt; i++) cols[i] = rank; if (rank == 0) { printf("[%d] sending...\n",rank); ierr = MPI_Send(cols,cnt,MPI_LONG_LONG_INT,1,0,MPI_COMM_WORLD); } else { printf("[%d] receiving...\n",rank); ierr = MPI_Recv(cols,cnt,MPI_LONG_LONG_INT,0,0,MPI_COMM_WORLD,&status); ierr = MPI_Get_count(&status,MPI_LONG_LONG_INT,&cnt); printf("[%d] count %d, cols[0] %lld\n",rank,cnt,cols[0]); } ierr = MPI_Finalize(); return 0; }