On Apr 4, 2008, at 2:47 PM, Matt Hughes wrote:
I was able to eliminate the hang I was seeing with 1.2.5 during the
gather operation by using these btl parameters (found at
http://svn.open-mpi.org/trac/ompi/browser/trunk/ompi/mca/btl/openib/btl-openib-benchmark)
:
btl_openib_max_btls=20
btl_openib_rd_num=128
btl_openib_rd_low=75
btl_openib_rd_win=50
btl_openib_max_eager_rdma=32
mpool_base_use_mem_hooks=1
mpi_leave_pinned=1
Only the btl_openib_rd_low=75 and btl_openib_rd_num=128 parameters are
necessary to avoid the hang.
The information given for the parameters in ompi_info is not very
helpful. Can anyone explain (or point me to a reference) what these
parameters do and how they affect collective operations?
Yes (btl_openib_ prefix omitted for brevity):
max_btls: the maximum number of active IB ports that Open MPI will use
in each MPI process
rd_num: Number of per-peer receive buffers posted when a connection is
made between two MPI processes. I.e., the first time you MPI_SEND/
MPI_RECV between a pair of MPI peers, rd_num buffers are posted for
incoming messages. More on this below.
rd_low: When the number of available buffers left on a per-peer queue
pair reaches this number (the low watermark), it is time to post more.
rd_win: When the number of available buffers left on a per-peer queue
pair reaches this number, send a flow control message to the peer.
max_eager_rdma: How many buffers to post for "eager" RDMA short
messages between explicit pairs of MPI processes. Note that eager
RDMA is only used between a fixed number of pairs of peers in order to
a) conserve registered memory and b) limit the number of memory
locations that must be polled to check for message passing progress.
Check out this [relatively new] FAQ entry for more details: http://www.open-mpi.org/faq/?category=openfabrics#ib-small-message-rdma
mpool_base_use_mem_hooks: If compiled with support for it (which is
usually the default), allow the use of the mpi_leave_pinned parameter.
mpi_leave_pinned: The simple description of this parameter is that if
you use the same buffers repeatedly for sending and receiving buffers,
enabling mpi_leave_pinned will likely result in a performance boost.
Check out these [relatively] new FAQ entries for more details: http://www.open-mpi.org/faq/?category=openfabrics#large-message-tuning-1.2
and http://www.open-mpi.org/faq/?category=openfabrics#large-message-leave-pinned
.
Note that long message tuning parameters are changing slightly in the
upcoming v1.3 series. Check out this FAQ entry:
http://www.open-mpi.org/faq/?category=openfabrics#large-message-tuning-1.3
Does this help? Sorry it took so long to answer your questions;
please feel free to ask more.
--
Jeff Squyres
Cisco Systems