Dear all,

First of all, all my apologies because I post this message to both the bug and user mailing list. But for the moment, I do not know if it is a bug!

I am running a CFD structured flow solver at ORNL, and I have an access to a small cluster (Smoky) using OpenMPI-1.4.2 with Infiniband by default. Recently we increased the size of our models, and since that time we have run into many infiniband related problems. The most serious problem is a hard crash with the following error message:

[/smoky45][[60998,1],32][/sw/sources/ompi/1.4.2/ompi/mca/btl/openib/connect/btl_openib_connect_oob.c:464:qp_create_one] error creating qp errno says Cannot allocate memory/

If we force the solver to use ethernet (mpirun -mca btl ^openib) the computations works correctly, although very slowly (a single iteration take ages). Do you have any idea what could be causing these problems?

If it is due to a bug or a limitation into OpenMPI, do you think the version 1.4.3, the coming 1.4.4 or any 1.5 version could solve the problem? I read the release notes, but I did not read any obvious patch which could fix my problem. The system administrator is ready to compile a new package for us, but I do not want to ask to install to many of them.

Thanks.
--
/
Mathieu Gontier
skype: mathieu_gontier /

Reply via email to