Hi,
I would suggest use MXM (part of mofed, can be downloaded as standalone rpm
from http://mellanox.com/products/mxm for ofed)

It uses UD (constant memory footprint) and should provide good performance.
The next MXM v2.0 will support RC and DC (reliable UD) as well.

Once mxm is installed from rpm (or extracted elsewhere from rpm->tarball) -
you can point ompi configure with "--with-mxm=/path/to/mxm")
Regards
M


On Fri, Jul 5, 2013 at 10:33 PM, Ben <benjamin.m.a...@nasa.gov> wrote:

> I'm part of a team that maintains a global climate model running under
> mpi. Recently we have been trying it out with different mpi stacks
> at high resolution/processor counts.
> At one point in the code there is a large number of mpi_isends/mpi_recv
> (tens to hundreds of thousands) when data distributed across all mpi
> processes must be collective on a particular processor or processors be
> transformed to a new resolution before writing. At first the model was
> crashing with a message:
> "A process failed to create a queue pair. This usually means either the
> device has run out of queue pairs (too many connections) or there are
> insufficient resources available to allocate a queue pair (out of memory).
> The latter can happen if either 1) insufficient memory is available, or 2)
> no more physical memory can be registered with the device."
> when it hit the part of code with the send/receives. Watching the node
> memory in an xterm I could see the memory skyrocket and fill the node.
>
> Somewhere we found a suggestion try using the xrc queues (
> http://www.open-mpi.org/faq/?**category=openfabrics#ib-xrc<http://www.open-mpi.org/faq/?category=openfabrics#ib-xrc>)
> to get around this problem and indeed running with
>
> setenv OMPI_MCA_btl_openib_receive_**queues "X,128,256,192,128:X,2048,256,
> **128,32:X,12288,256,128,32:X,**65536,256,128,32"
> mpirun --bind-to-core -np numproc ./app
>
> allowed the model to successfully run. It still seems to use a large
> amount of memory when it writes (on the order of several Gb). Does anyone
> have any  suggestions on how to perhaps tweak the settings to help with
> memory use.
>
> --
> Ben Auer, PhD   SSAI, Scientific Programmer/Analyst
> NASA GSFC,  Global Modeling and Assimilation Office
> Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD  20771
> Phone: 301-286-9176               Fax: 301-614-6246
>
> ______________________________**_________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/**mailman/listinfo.cgi/users<http://www.open-mpi.org/mailman/listinfo.cgi/users>
>

Reply via email to