[OMPI users] Fw: Intra-node communication

Mudassar Majeed Fri, 1 Jun 2012 11:05:56 -0400

Running with enabled shared memory gave me the following error.

mpprun INFO: Starting openmpi run on 2 nodes (16 ranks)...
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened.  This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded).  Note that
Open MPI stopped checking at the first component that it did not find.


Host:      n568
Framework: btl
Component: tcp
----------------

may be it is not installed on our supercomputing center. What do you suggest ?

best regards,


----- Forwarded Message -----
From: Mudassar Majeed <mudassar...@yahoo.com>
To: Jeff Squyres <jsquy...@cisco.com> 
Sent: Friday, June 1, 2012 5:03 PM
Subject: Re: [OMPI users] Intra-node communication
 

Here is the code, I am taking care of the first message. I start measuring the 
round trip time from second message. If you see in the code I do 100 hand 
shakes and find the overall time for them. I have two nodes each having 8 cores 
...... first I do exchange of messages between process 1 to process 2 because 
they are on the same node and measure the time. Then I do messages exchange 
between process 1 and 12 as they are on different nodes. But the output I got 
is as follows, 

---------------------------------------------------------------------------------
 
mpprun INFO: Starting openmpi run on 2 nodes (16 ranks)... 

with-in node: time = 150.663382 secs 
across nodes: time = 134.627887 secs 
---------------------------------------------------------------------------------
 


the code is as follows, 


double *buff = NULL;
    double ex_time = 0.0f;
    
    buff = new double[1000000];
    
    for(i=0;i<1000000;i++)
    *(buff+i) = 100.5352f;
    
    MPI_Barrier(MPI_COMM_WORLD);
    
    int comm_amount = 100;//*(comm + my_rank * N + i);
    
    if(comm_amount > 0)
    {
        if(my_rank == 1)
        {
        for(int j=0;j<comm_amount;j++)
        {
            if(j>0)
           
 {
            clock_gettime(CLOCK_REALTIME, &stime);
            }
            
            MPI_Ssend((void*)buff, 1000000, MPI_DOUBLE, 2, 4600, 
MPI_COMM_WORLD);
            MPI_Recv((void*)buff, 1000000, MPI_DOUBLE, 2, 4600, MPI_COMM_WORLD, 
&status);
            
            if(j>0)
            {
            clock_gettime(CLOCK_REALTIME, &etime);
            ex_time = ex_time + (etime.tv_sec  - stime.tv_sec) + 
1e-9*(etime.tv_nsec  -
 stime.tv_nsec);
            }
        }
        }
        else if(my_rank == 2)
        {        
        for(int j=0;j<comm_amount;j++)
        {
            if(j>0)
            {
            clock_gettime(CLOCK_REALTIME, &stime);
            }
            
            MPI_Recv((void*)buff, 1000000, MPI_DOUBLE, 1, 4600, MPI_COMM_WORLD, 
&status);
       
     MPI_Ssend((void*)buff, 1000000, MPI_DOUBLE, 1, 4600, MPI_COMM_WORLD);
            
            if(j>0)
            {
            clock_gettime(CLOCK_REALTIME, &etime);
            ex_time = ex_time + (etime.tv_sec  - stime.tv_sec) + 
1e-9*(etime.tv_nsec  - stime.tv_nsec);
            }
        }
        }
        
        if(my_rank == 1)
        printf("\nwith-in node: time = %f\n", ex_time);
        
       
 ex_time = 0.0f;
        
        if(my_rank == 1)
        {
        for(int j=0;j<comm_amount;j++)
        {
            if(j>0)
            {
            clock_gettime(CLOCK_REALTIME, &stime);
            }
            
            MPI_Ssend((void*)buff, 1000000, MPI_DOUBLE, 12, 4600, 
MPI_COMM_WORLD);
            MPI_Recv((void*)buff, 1000000, MPI_DOUBLE, 12, 4600, 
MPI_COMM_WORLD, &status);
       
     
            if(j>0)
            {
            clock_gettime(CLOCK_REALTIME, &etime);
            ex_time = ex_time + (etime.tv_sec  - stime.tv_sec) + 
1e-9*(etime.tv_nsec  - stime.tv_nsec);
            }
        }
        }
        else if(my_rank == 12)
        {        
        for(int j=0;j<comm_amount;j++)
        {
            if(j>0)
           
 {
            clock_gettime(CLOCK_REALTIME, &stime);
            }
            
            MPI_Recv((void*)buff, 1000000, MPI_DOUBLE, 1, 4600, MPI_COMM_WORLD, 
&status);
            MPI_Ssend((void*)buff, 1000000, MPI_DOUBLE, 1, 4600, 
MPI_COMM_WORLD);
            
            if(j>0)
            {
            clock_gettime(CLOCK_REALTIME, &etime);
            ex_time = ex_time + (etime.tv_sec  - stime.tv_sec) + 
1e-9*(etime.tv_nsec  -
 stime.tv_nsec);
            }
        }
        }
        
        if(my_rank == 1)
        printf("\nacross nodes: time = %f\n", ex_time);
    }



This time I have added -mca btl self,sm,tcp
may be it will enable the shared memory support. But i had to do with mprun 
(not mpirun) as I have to submit job and can't use mpirun directly on 
supercomputer.

thanks for your help,

best 





________________________________
 From: Jeff Squyres <jsquy...@cisco.com>
To: Open MPI Users <us...@open-mpi.org> 
Cc: Mudassar Majeed <mudassar...@yahoo.com> 
Sent: Friday, June 1, 2012 4:52 PM
Subject: Re: [OMPI users] Intra-node communication
 
...and exactly how you measured.  You might want to run a well-known benchmark, 
like NetPIPE or the OSU pt2pt benchmarks.

Note that the *first* send between any given peer pair is likely to be slow 
because OMPI does a lazy connection scheme (i.e., the connection is made behind 
the scenes).  Subsequent sends are likely faster.  Well-known benchmarks do a 
bunch of warmup sends and then start timing after those are all done.

Also ensure that you have shared memory support enabled.  It is likely to be 
enabled by default, but if you're seeing different performance than you expect, 
that's something to check.


On Jun 1, 2012, at 10:44 AM, Jingcha Joba wrote:

> This should not happen. Typically, Intra node communication latency are way 
> way cheaper than inter node.
> Can you please tell us how u ran your application ?
> Thanks 
> 
> --
> Sent from my iPhone
> 
> On Jun
 1, 2012, at 7:34 AM, Mudassar Majeed <mudassar...@yahoo.com> wrote:
> 
>> Dear MPI people, 
>>                                Can someone tell me why MPI_Ssend takes more 
>>time when two MPI processes are on same node ...... ?? the same two processes 
>>on different nodes take much less time for the same message exchange. I am 
>>using a supercomputing center and this happens. I was writing an algorithm to 
>>reduce the across node communication. But now I found that across node 
>>communication is cheaper than communication within a node (with 8 cores on 
>>each node).
>> 
>> best regards,
>> 
>> Mudassar
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

[OMPI users] Fw: Intra-node communication

Reply via email to