Hi Neeraj,
>        Could anyone tell me the important tuning parameters in openmpi with
>    IB interconnect? I tried setting eager_rdma, min_rdma_size,
>    mpi_leave_pinned parameters from the mpirun command line on 38 nodes
>    cluster (38*2 processors) but in vain. I found simple mpirun with no mca
>    parameters performing better. I conducted test on P2P send/receive with
>    data size of 8MB.
The performance of the BTL with different parameters depends heavily on
the code that you run. E.g., leave_pinned works very well with many
microbenchmarks (e.g., bandwidth/overlap-wise) but may not perform well
with real applications that use different memory regions. It's pretty
much the same with the other parameters. The default values are
considered best for many applications. Can you provide us any details
about the code you're runnning to test performance? 

>        Similarly i patched HPL linpack code with libnbc(non blocking
>    collectives) and found no performance benefits. I went through its patch
>    and found that, its probably not overlapping computation with
>    communication.
Ah, so there are two things. LibNBC provides overlap, most overlap is
achieved if memory regions are reused and leave_pinned is activated. But
again, this is highly application-dependent. However, the patch for the
Linpack code (I guess you refer to the patch from the LibNBC webpage
[1]) is in experimental stage (as the website says) and is not properly
tested for performance benefit. The original HPL provides something like
a broadcast start and broadcast end phase. I just replaced them with
non-blocking calls to NBC_Ibcast() and did not find the time to do any
performance/code analysis yet. Any input by HPL experts is appreciated!

Best,
  Torsten

[1]: http://www.unixer.de/research/nbcoll/hpl/

-- 
 bash$ :(){ :|:&};: --------------------- http://www.unixer.de/ -----
"Software Engineering is that part of Computer Science which is too
difficult for the Computer Scientist." ~ F. L. Bauer

Reply via email to