Hi,    The code was pretty simple. I was trying to send 8MB data 
from one rank to other in a loop(say 1000 iterations). And then i was taking 
the average of time taken and was calculating the bandwidth.The above logic i 
tried with both mpirun-with-mca-parameters and without any parameters. And to 
my surprise, the performance was degrading when i was trying to manipulate.Now 
I have another question in mind. Is it possible to have IB Hardware Multicast 
implementation in OpenMPI? I have gone through the issues/challenges for the 
same, but also read couple of people who have successfully done it for 
Ethernet/Giga-bit Ethernet and IPoIB ofcourse in experimental stage. Actually i 
want to contribute for it in OpenMPI and need the help for the same.-NeerajOn 
Thu, 11 Oct 2007 12:01:39 +0200 Open MPI Users  wrote  Hi Neeraj,  >        
Could anyone tell me the important tuning parameters in openmpi with  >    
IB interconnect? I tried setting eager_rdma, min_rdma_size,  >    
mpi_leave_pinned parameters from the mpirun command line on 38 nodes  >    
cluster (38*2 processors) but in vain. I found simple mpirun with no mca  >  
  parameters performing better. I conducted test on P2P send/receive with  > 
   data size of 8MB.  The performance of the BTL with different parameters 
depends heavily on  the code that you run. E.g., leave_pinned works very well 
with many  microbenchmarks (e.g., bandwidth/overlap-wise) but may not perform 
well  with real applications that use different memory regions. It\'s pretty  
much the same with the other parameters. The default values are  considered 
best for many applications. Can you provide us any details  about the code 
you\'re runnning to test performance?     >        Similarly i patched HPL 
linpack code with libnbc(non blocking  >    collectives) and found no 
performance benefits. I went through its patch  >    and found that, its 
probably not overlapping computation with  >    communication.  Ah, so there 
are two things. LibNBC provides overlap,
 most overlap is  achieved if memory regions are reused and leave_pinned is 
activated. But  again, this is highly application-dependent. However, the patch 
for the  Linpack code (I guess you refer to the patch from the LibNBC webpage  
[1]) is in experimental stage (as the website says) and is not properly  tested 
for performance benefit. The original HPL provides something like  a broadcast 
start and broadcast end phase. I just replaced them with  non-blocking calls to 
NBC_Ibcast() and did not find the time to do any  performance/code analysis 
yet. Any input by HPL experts is appreciated!    Best,    Torsten    [1]: 
http://www.unixer.de/research/nbcoll/hpl/    --    bash$ :(){ :|:&};: 
--------------------- http://www.unixer.de/ -----  \"Software Engineering is 
that part of Computer Science which is too  difficult for the Computer 
Scientist.\" ~ F. L. Bauer  _______________________________________________  
users mailing list  us...@open-mpi.org  
http://www.open-mpi.org/mailman/listinfo.cgi/users  

Reply via email to