Running with enabled shared memory gave me the following error. mpprun INFO: Starting openmpi run on 2 nodes (16 ranks)... -------------------------------------------------------------------------- A requested component was not found, or was unable to be opened. This means that this component is either not installed or is unable to be used on your system (e.g., sometimes this means that shared libraries that the component requires are unable to be found/loaded). Note that Open MPI stopped checking at the first component that it did not find.
Host: n568 Framework: btl Component: tcp ---------------- may be it is not installed on our supercomputing center. What do you suggest ? best regards, ----- Forwarded Message ----- From: Mudassar Majeed <mudassar...@yahoo.com> To: Jeff Squyres <jsquy...@cisco.com> Sent: Friday, June 1, 2012 5:03 PM Subject: Re: [OMPI users] Intra-node communication Here is the code, I am taking care of the first message. I start measuring the round trip time from second message. If you see in the code I do 100 hand shakes and find the overall time for them. I have two nodes each having 8 cores ...... first I do exchange of messages between process 1 to process 2 because they are on the same node and measure the time. Then I do messages exchange between process 1 and 12 as they are on different nodes. But the output I got is as follows, --------------------------------------------------------------------------------- mpprun INFO: Starting openmpi run on 2 nodes (16 ranks)... with-in node: time = 150.663382 secs across nodes: time = 134.627887 secs --------------------------------------------------------------------------------- the code is as follows, double *buff = NULL; double ex_time = 0.0f; buff = new double[1000000]; for(i=0;i<1000000;i++) *(buff+i) = 100.5352f; MPI_Barrier(MPI_COMM_WORLD); int comm_amount = 100;//*(comm + my_rank * N + i); if(comm_amount > 0) { if(my_rank == 1) { for(int j=0;j<comm_amount;j++) { if(j>0) { clock_gettime(CLOCK_REALTIME, &stime); } MPI_Ssend((void*)buff, 1000000, MPI_DOUBLE, 2, 4600, MPI_COMM_WORLD); MPI_Recv((void*)buff, 1000000, MPI_DOUBLE, 2, 4600, MPI_COMM_WORLD, &status); if(j>0) { clock_gettime(CLOCK_REALTIME, &etime); ex_time = ex_time + (etime.tv_sec - stime.tv_sec) + 1e-9*(etime.tv_nsec - stime.tv_nsec); } } } else if(my_rank == 2) { for(int j=0;j<comm_amount;j++) { if(j>0) { clock_gettime(CLOCK_REALTIME, &stime); } MPI_Recv((void*)buff, 1000000, MPI_DOUBLE, 1, 4600, MPI_COMM_WORLD, &status); MPI_Ssend((void*)buff, 1000000, MPI_DOUBLE, 1, 4600, MPI_COMM_WORLD); if(j>0) { clock_gettime(CLOCK_REALTIME, &etime); ex_time = ex_time + (etime.tv_sec - stime.tv_sec) + 1e-9*(etime.tv_nsec - stime.tv_nsec); } } } if(my_rank == 1) printf("\nwith-in node: time = %f\n", ex_time); ex_time = 0.0f; if(my_rank == 1) { for(int j=0;j<comm_amount;j++) { if(j>0) { clock_gettime(CLOCK_REALTIME, &stime); } MPI_Ssend((void*)buff, 1000000, MPI_DOUBLE, 12, 4600, MPI_COMM_WORLD); MPI_Recv((void*)buff, 1000000, MPI_DOUBLE, 12, 4600, MPI_COMM_WORLD, &status); if(j>0) { clock_gettime(CLOCK_REALTIME, &etime); ex_time = ex_time + (etime.tv_sec - stime.tv_sec) + 1e-9*(etime.tv_nsec - stime.tv_nsec); } } } else if(my_rank == 12) { for(int j=0;j<comm_amount;j++) { if(j>0) { clock_gettime(CLOCK_REALTIME, &stime); } MPI_Recv((void*)buff, 1000000, MPI_DOUBLE, 1, 4600, MPI_COMM_WORLD, &status); MPI_Ssend((void*)buff, 1000000, MPI_DOUBLE, 1, 4600, MPI_COMM_WORLD); if(j>0) { clock_gettime(CLOCK_REALTIME, &etime); ex_time = ex_time + (etime.tv_sec - stime.tv_sec) + 1e-9*(etime.tv_nsec - stime.tv_nsec); } } } if(my_rank == 1) printf("\nacross nodes: time = %f\n", ex_time); } This time I have added -mca btl self,sm,tcp may be it will enable the shared memory support. But i had to do with mprun (not mpirun) as I have to submit job and can't use mpirun directly on supercomputer. thanks for your help, best ________________________________ From: Jeff Squyres <jsquy...@cisco.com> To: Open MPI Users <us...@open-mpi.org> Cc: Mudassar Majeed <mudassar...@yahoo.com> Sent: Friday, June 1, 2012 4:52 PM Subject: Re: [OMPI users] Intra-node communication ...and exactly how you measured. You might want to run a well-known benchmark, like NetPIPE or the OSU pt2pt benchmarks. Note that the *first* send between any given peer pair is likely to be slow because OMPI does a lazy connection scheme (i.e., the connection is made behind the scenes). Subsequent sends are likely faster. Well-known benchmarks do a bunch of warmup sends and then start timing after those are all done. Also ensure that you have shared memory support enabled. It is likely to be enabled by default, but if you're seeing different performance than you expect, that's something to check. On Jun 1, 2012, at 10:44 AM, Jingcha Joba wrote: > This should not happen. Typically, Intra node communication latency are way > way cheaper than inter node. > Can you please tell us how u ran your application ? > Thanks > > -- > Sent from my iPhone > > On Jun 1, 2012, at 7:34 AM, Mudassar Majeed <mudassar...@yahoo.com> wrote: > >> Dear MPI people, >> Can someone tell me why MPI_Ssend takes more >>time when two MPI processes are on same node ...... ?? the same two processes >>on different nodes take much less time for the same message exchange. I am >>using a supercomputing center and this happens. I was writing an algorithm to >>reduce the across node communication. But now I found that across node >>communication is cheaper than communication within a node (with 8 cores on >>each node). >> >> best regards, >> >> Mudassar >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/