Thank You Jeff Squyres. Could you suggest the method to run layer 0 diagnostics to know that if the fabric is clean. I have contacted Dell local(Taiwan). I don't think they are familiar with Openmpi even the infiniband module. Does anyone have the IB stack hangs problem with Mellanox ConnectX product?

Thank you again.

Best Regards,

Gloria Jan
Wavelink Technology Inc


I can confirm that I have exactly the same problem, also on Dell
system, even with latest openpmpi.

Our system is:

Dell M905
OpenSUSE 11.1
kernel: 2.6.27.21-0.1-default
ofed-1.4-21.12 from SUSE repositories.
OpenMPI-1.3.2


But what I can also add, it not only affect openmpi, if this messages
are triggered after mpirun:
[node032][[9340,1],11][btl_openib_component.c:3002:poll_device] error
polling HP CQ with -2 errno says Success

Then IB stack hangs. You cannot even reload it, have to reboot node.



Something that severe should not be able to be caused by Open MPI.
Specifically: Open MPI should not be able to hang the OFED stack.
Have you run layer 0 diagnostics to know that your fabric is clean?
You might want to contact your IB vendor to find out how to do that.

--
Jeff Squyres
Cisco Systems



------------------------------

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

End of users Digest, Vol 1217, Issue 2
**************************************


Reply via email to