I wonder if this is related to memory pinning. Can you try turning off
the leave pinned, and see if the problem persists (this may affect
performance, but should avoid the crash):
  mpirun ... --mca mpi_leave_pinned 0 ...

Also it looks like Smoky has a slightly newer version of the 1.4
branch that you should try to switch to if you can. The following
command will show you all of the available installs on that machine:
  shell$ module avail ompi

For a list of supported compilers for that version try the 'show' option:
shell$ module show ompi/1.4.3
-------------------------------------------------------------------
/sw/smoky/modulefiles-centos/ompi/1.4.3:

module-whatis    This module configures your environment to make Open
MPI 1.4.3 available.
Supported Compilers:
     pathscale/3.2.99
     pathscale/3.2
     pgi/10.9
     pgi/10.4
     intel/11.1.072
     gcc/4.4.4
     gcc/4.4.3
-------------------------------------------------------------------

Let me know if that helps.

Josh


On Wed, Jun 22, 2011 at 4:16 AM, Mathieu Gontier
<mathieu.gont...@gmail.com> wrote:
> Dear all,
>
> First of all, all my apologies because I post this message to both the bug
> and user mailing list. But for the moment, I do not know if it is a bug!
>
> I am running a CFD structured flow solver at ORNL, and I have an access to a
> small cluster (Smoky) using OpenMPI-1.4.2 with Infiniband by default.
> Recently we increased the size of our models, and since that time we have
> run into many infiniband related problems.  The most serious problem is a
> hard crash with the following error message:
>
> [smoky45][[60998,1],32][/sw/sources/ompi/1.4.2/ompi/mca/btl/openib/connect/btl_openib_connect_oob.c:464:qp_create_one]
> error creating qp errno says Cannot allocate memory
>
> If we force the solver to use ethernet (mpirun -mca btl ^openib) the
> computations works correctly, although very slowly (a single iteration take
> ages). Do you have any idea what could be causing these problems?
>
> If it is due to a bug or a limitation into OpenMPI, do you think the version
> 1.4.3, the coming 1.4.4 or any 1.5 version could solve the problem? I read
> the release notes, but I did not read any obvious patch which could fix my
> problem. The system administrator is ready to compile a new package for us,
> but I do not want to ask to install to many of them.
>
> Thanks.
> --
>
> Mathieu Gontier
> skype: mathieu_gontier
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey

Reply via email to