Dear all,
First of all, all my apologies because I post this message to both the
bug and user mailing list. But for the moment, I do not know if it is a bug!
I am running a CFD structured flow solver at ORNL, and I have an access
to a small cluster (Smoky) using OpenMPI-1.4.2 with Infiniband by
default. Recently we increased the size of our models, and since that
time we have run into many infiniband related problems. The most
serious problem is a hard crash with the following error message:
[/smoky45][[60998,1],32][/sw/sources/ompi/1.4.2/ompi/mca/btl/openib/connect/btl_openib_connect_oob.c:464:qp_create_one]
error creating qp errno says Cannot allocate memory/
If we force the solver to use ethernet (mpirun -mca btl ^openib) the
computations works correctly, although very slowly (a single iteration
take ages). Do you have any idea what could be causing these problems?
If it is due to a bug or a limitation into OpenMPI, do you think the
version 1.4.3, the coming 1.4.4 or any 1.5 version could solve the
problem? I read the release notes, but I did not read any obvious patch
which could fix my problem. The system administrator is ready to compile
a new package for us, but I do not want to ask to install to many of them.
Thanks.
--
/
Mathieu Gontier
skype: mathieu_gontier /