I wonder if this is related to memory pinning. Can you try turning off the leave pinned, and see if the problem persists (this may affect performance, but should avoid the crash): mpirun ... --mca mpi_leave_pinned 0 ...
Also it looks like Smoky has a slightly newer version of the 1.4 branch that you should try to switch to if you can. The following command will show you all of the available installs on that machine: shell$ module avail ompi For a list of supported compilers for that version try the 'show' option: shell$ module show ompi/1.4.3 ------------------------------------------------------------------- /sw/smoky/modulefiles-centos/ompi/1.4.3: module-whatis This module configures your environment to make Open MPI 1.4.3 available. Supported Compilers: pathscale/3.2.99 pathscale/3.2 pgi/10.9 pgi/10.4 intel/11.1.072 gcc/4.4.4 gcc/4.4.3 ------------------------------------------------------------------- Let me know if that helps. Josh On Wed, Jun 22, 2011 at 4:16 AM, Mathieu Gontier <mathieu.gont...@gmail.com> wrote: > Dear all, > > First of all, all my apologies because I post this message to both the bug > and user mailing list. But for the moment, I do not know if it is a bug! > > I am running a CFD structured flow solver at ORNL, and I have an access to a > small cluster (Smoky) using OpenMPI-1.4.2 with Infiniband by default. > Recently we increased the size of our models, and since that time we have > run into many infiniband related problems. The most serious problem is a > hard crash with the following error message: > > [smoky45][[60998,1],32][/sw/sources/ompi/1.4.2/ompi/mca/btl/openib/connect/btl_openib_connect_oob.c:464:qp_create_one] > error creating qp errno says Cannot allocate memory > > If we force the solver to use ethernet (mpirun -mca btl ^openib) the > computations works correctly, although very slowly (a single iteration take > ages). Do you have any idea what could be causing these problems? > > If it is due to a bug or a limitation into OpenMPI, do you think the version > 1.4.3, the coming 1.4.4 or any 1.5 version could solve the problem? I read > the release notes, but I did not read any obvious patch which could fix my > problem. The system administrator is ready to compile a new package for us, > but I do not want to ask to install to many of them. > > Thanks. > -- > > Mathieu Gontier > skype: mathieu_gontier > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Joshua Hursey Postdoctoral Research Associate Oak Ridge National Laboratory http://users.nccs.gov/~jjhursey