Hi, Thank you for the reply. However, using MPI_waitall instead of MPI_wait didn't solve the problem. The code would hang at the MPI_waitall. Also, I'm not quit understand why the code is inherently unsafe. Can the non-blocking send or receive cause any deadlock?
Thanks! Kong On Mon, Feb 21, 2011 at 2:32 PM, Jeff Squyres <jsquy...@cisco.com> wrote: > It's because you're waiting on the receive request to complete before the > send request. This likely works locally because the message transfer is > through shared memory and is fast, but it's still an inherently unsafe way to > block waiting for completion (i.e., the receive might not complete if the > send does not complete). > > What you probably want to do is build an array of 2 requests and then issue a > single MPI_Waitall() on both of them. This will allow MPI to progress both > requests simultaneously. > > > On Feb 18, 2011, at 11:58 AM, Xianglong Kong wrote: > >> Hi, all, >> >> I’m an mpi newbie. I’m trying to connect two desktops in my office >> with each other using a crossing cable and implement a parallel code >> on them using MPI. >> >> Now, the two nodes can ssh to each other without password, and can >> successfully run the MPI “Hello world” code. However, when I tried to >> use multiple MPI non-blocking sends or receives, the job would hang. >> The problem only showed up if the two processes are launched in the >> different nodes, the code can run successfully if the two processes >> are launched in the same node. Also, the code can run successfully if >> there are only one send or/and one receive in each process. >> >> Here is the code that can run successfully: >> >> #include <stdlib.h> >> #include <stdio.h> >> #include <string.h> >> #include <mpi.h> >> >> int main(int argc, char** argv) { >> >> int myrank, nprocs; >> >> MPI_Init(&argc, &argv); >> MPI_Comm_size(MPI_COMM_WORLD, &nprocs); >> MPI_Comm_rank(MPI_COMM_WORLD, &myrank); >> >> printf("Hello from processor %d of %d\n", myrank, nprocs); >> >> MPI_Request reqs1, reqs2; >> MPI_Status stats1, stats2; >> >> int tag1=10; >> int tag2=11; >> >> int buf; >> int mesg; >> int source=1-myrank; >> int dest=1-myrank; >> >> if(myrank==0) >> { >> mesg=1; >> >> MPI_Irecv(&buf, 1, MPI_INT, source, tag1, MPI_COMM_WORLD, >> &reqs1); >> MPI_Isend(&mesg, 1, MPI_INT, dest, tag2, MPI_COMM_WORLD, >> &reqs2); >> >> >> } >> >> if(myrank==1) >> { >> mesg=2; >> >> MPI_Irecv(&buf, 1, MPI_INT, source, tag2, MPI_COMM_WORLD, >> &reqs1); >> MPI_Isend(&mesg, 1, MPI_INT, dest, tag1, MPI_COMM_WORLD, >> &reqs2); >> } >> >> MPI_Wait(&reqs1, &stats1); >> printf("myrank=%d,received the message\n",myrank); >> >> MPI_Wait(&reqs2, &stats2); >> printf("myrank=%d,sent the messages\n",myrank); >> >> printf("myrank=%d, buf=%d\n",myrank, buf); >> >> MPI_Finalize(); >> return 0; >> } >> >> And here is the code that will hang >> >> #include <stdlib.h> >> #include <stdio.h> >> #include <string.h> >> #include <mpi.h> >> >> int main(int argc, char** argv) { >> >> int myrank, nprocs; >> >> MPI_Init(&argc, &argv); >> MPI_Comm_size(MPI_COMM_WORLD, &nprocs); >> MPI_Comm_rank(MPI_COMM_WORLD, &myrank); >> >> printf("Hello from processor %d of %d\n", myrank, nprocs); >> >> MPI_Request reqs1, reqs2; >> MPI_Status stats1, stats2; >> >> int tag1=10; >> int tag2=11; >> >> int source=1-myrank; >> int dest=1-myrank; >> >> if(myrank==0) >> { >> int buf1, buf2; >> >> MPI_Irecv(&buf1, 1, MPI_INT, source, tag1, MPI_COMM_WORLD, >> &reqs1); >> MPI_Irecv(&buf2, 1, MPI_INT, source, tag2, MPI_COMM_WORLD, >> &reqs2); >> >> MPI_Wait(&reqs1, &stats1); >> printf("received one message\n"); >> >> MPI_Wait(&reqs2, &stats2); >> printf("received two messages\n"); >> >> printf("myrank=%d, buf1=%d, buf2=%d\n",myrank, buf1, buf2); >> } >> >> if(myrank==1) >> { >> int mesg1=1; >> int mesg2=2; >> >> MPI_Isend(&mesg1, 1, MPI_INT, dest, tag1, MPI_COMM_WORLD, >> &reqs1); >> MPI_Isend(&mesg2, 1, MPI_INT, dest, tag2, MPI_COMM_WORLD, >> &reqs2); >> >> MPI_Wait(&reqs1, &stats1); >> printf("sent one message\n"); >> >> MPI_Wait(&reqs2, &stats2); >> printf("sent two messages\n"); >> } >> >> MPI_Finalize(); >> return 0; >> } >> >> And the output of the second failed code: >> *********************************************** >> Hello from processor 0 of 2 >> >> Received one message >> >> Hello from processor 1 of 2 >> >> Sent one message >> ******************************************************* >> >> Can anyone help to point out why the second code didn't work? >> >> Thanks! >> >> Kong >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Xianglong Kong Department of Mechanical Engineering University of Rochester Phone: (585)520-4412 MSN: dinosaur8...@hotmail.com