On Jun 20, 2012, at 3:36 PM, Martin Siegert wrote:

> by now we know of three programs - dirac, wrf, quantum espresso - that
> all hang with openmpi-1.4.x (have not yet checked with openmpi-1.6).
> All of these programs run to completion with the mpiexec commandline
> argument: --mca btl_openib_flags 305
> We now set this in the global configuration file openmpi-mca-params.conf.
> What is the reason that this is not the default in the first place?
> Are there any negative effects?

Two things:

1. These flags -- 305 (or 0x131 or 0001 0011 0001) translate to telling the 
openib BTL the following:

- 1: SEND: meaning that the openib BTL is using send/receive semantics
- 16: ACK: meaningless with the ob1 PML
- 32: CHECKSUM: meaningless with the ob1 PML
- 256: meaningless

What's meaning here is what is missing: RDMA PUT and GET.  So all RDMA support 
is disabled.

This will work fine, but you may want to increase your 
mca_btl_openib_eager_limit size (e.g., U. Michigan did the same thing as you -- 
disabled RDMA -- but increased the eager limit to 64k to get back some of the 
lost performance).

2. We believe that we have *finally* (just recently) fixed this issue in the 
SVN trunk and upcoming 1.6.1 release.  I have a test pre-release 1.6.1 tarball 
-- would you mind giving it a whirl?

http://www.open-mpi.org/~jsquyres/unofficial/openmpi-1.6.1ticket3131r26612M.tar.bz2

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to