You may use tools like this http://linux.die.net/man/1/ibdiagnet
to debug your ib network problems. Most likely, you have some bad cable or 
connector somewhere in the network.
The tool should be able to pin-point the problem.

Pavel (Pasha) Shamis
---
Computer Science Research Group
Computer Science and Math Division
Oak Ridge National Laboratory






On Jun 17, 2013, at 9:41 AM, Jeff Squyres (jsquyres) 
<jsquy...@cisco.com<mailto:jsquy...@cisco.com>> wrote:

That sounds like there's a problem with your InfiniBand fabric.

You should run a complete level-0 diagnostic on your IB network.


On Jun 17, 2013, at 5:23 AM, "Singh, Bharati (GE Global Research, consultant)" 
<bharati.si...@ge.com<mailto:bharati.si...@ge.com>> wrote:

Hi Team,

Our users jobs are hanging and we notice below errors.

[[61410,1],65][btl_openib_component.c:3238:handle_wc] from bng1aviationdc22 to: 
bng1aviationdc26 error polling LP CQ with status RETRY EXCEEDED ERROR status 
number 12 for wr_id 774739584 opcode 1  vendor error 129 qp_idx 0

PFA file for more information.

Thanks,
Bharati Singh
*****************************************************************************
**                                                                         **
** WARNING:  This email contains an attachment of a very suspicious type.  **
** You are urged NOT to open this attachment unless you are absolutely     **
** sure it is legitimate.  Opening this attachment may cause irreparable   **
** damage to your computer and your files.  If you have any questions      **
** about the validity of this message, PLEASE SEEK HELP BEFORE OPENING IT. **
**                                                                         **
** This warning was added by the IU Computer Science Dept. mail scanner.   **
*****************************************************************************


<output.14807.zip>_______________________________________________
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
jsquy...@cisco.com<mailto:jsquy...@cisco.com>
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to