You may use tools like this http://linux.die.net/man/1/ibdiagnet to debug your ib network problems. Most likely, you have some bad cable or connector somewhere in the network. The tool should be able to pin-point the problem.
Pavel (Pasha) Shamis --- Computer Science Research Group Computer Science and Math Division Oak Ridge National Laboratory On Jun 17, 2013, at 9:41 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com<mailto:jsquy...@cisco.com>> wrote: That sounds like there's a problem with your InfiniBand fabric. You should run a complete level-0 diagnostic on your IB network. On Jun 17, 2013, at 5:23 AM, "Singh, Bharati (GE Global Research, consultant)" <bharati.si...@ge.com<mailto:bharati.si...@ge.com>> wrote: Hi Team, Our users jobs are hanging and we notice below errors. [[61410,1],65][btl_openib_component.c:3238:handle_wc] from bng1aviationdc22 to: bng1aviationdc26 error polling LP CQ with status RETRY EXCEEDED ERROR status number 12 for wr_id 774739584 opcode 1 vendor error 129 qp_idx 0 PFA file for more information. Thanks, Bharati Singh ***************************************************************************** ** ** ** WARNING: This email contains an attachment of a very suspicious type. ** ** You are urged NOT to open this attachment unless you are absolutely ** ** sure it is legitimate. Opening this attachment may cause irreparable ** ** damage to your computer and your files. If you have any questions ** ** about the validity of this message, PLEASE SEEK HELP BEFORE OPENING IT. ** ** ** ** This warning was added by the IU Computer Science Dept. mail scanner. ** ***************************************************************************** <output.14807.zip>_______________________________________________ users mailing list us...@open-mpi.org<mailto:us...@open-mpi.org> http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com<mailto:jsquy...@cisco.com> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users