You may use tools like this http://linux.die.net/man/1/ibdiagnet
to debug your ib network problems. Most likely, you have some bad cable or
connector somewhere in the network.
The tool should be able to pin-point the problem.
Pavel (Pasha) Shamis
---
Computer Science Research Group
Computer Scien
That sounds like there's a problem with your InfiniBand fabric.
You should run a complete level-0 diagnostic on your IB network.
On Jun 17, 2013, at 5:23 AM, "Singh, Bharati (GE Global Research, consultant)"
wrote:
> Hi Team,
>
> Our users jobs are hanging and we notice below errors.
>
Hi Team,
Our users jobs are hanging and we notice below errors.
[[61410,1],65][btl_openib_component.c:3238:handle_wc] from
bng1aviationdc22 to: bng1aviationdc26 error polling LP CQ with status
RETRY EXCEEDED ERROR status number 12 for wr_id 774739584 opcode 1
vendor error 129 qp_idx 0