On Mar 1, 2009, at 7:24 PM, Brett Pemberton wrote:
I'd appreciate some advice on if I'm using OFED correctly.
I'm running OFED 1.4, however not the kernel modules, just userland.
Is this a bad idea?
I believe so. I'm not a kernel guy, but I've always used the userland
bits matched with th
Matt Hughes wrote:
2009/2/26 Brett Pemberton :
[[1176,1],0][btl_openib_component.c:2905:handle_wc] from tango092.vpac.org
to: tango090 error polling LP CQ with status RETRY EXCEEDED ERROR status
number 12 for wr_id 38996224 opcode 0 qp_idx 0
What OS are you using?
Centos 5
I've seen this
On Feb 27, 2009, at 12:09 PM, Åke Sandgren wrote:
We see these errors fairly frequently on our CentOS 5.2 system with
Mellanox InfiniHost III cards. The OFED stack is whatever the
CentOS5.2
uses. Has anyone tested that with the 1.4 OFED stack?
FWIW, I have tested OMPI's openib BTL with sev
Usually "retry exceeded error" points to some network issues, like bad
cable or some bad connector. You may use ibdiagnet tool for the network
debug - *http://linux.die.net/man/1/ibdiagnet. *This tool is part of OFED.
Pasha
Brett Pemberton wrote:
Hey,
I've had a couple of errors recently, of
On Fri, 2009-02-27 at 09:54 -0700, Matt Hughes wrote:
> 2009/2/26 Brett Pemberton :
> > [[1176,1],0][btl_openib_component.c:2905:handle_wc] from tango092.vpac.org
> > to: tango090 error polling LP CQ with status RETRY EXCEEDED ERROR status
> > number 12 for wr_id 38996224 opcode 0 qp_idx 0
>
> Wha
2009/2/26 Brett Pemberton :
> [[1176,1],0][btl_openib_component.c:2905:handle_wc] from tango092.vpac.org
> to: tango090 error polling LP CQ with status RETRY EXCEEDED ERROR status
> number 12 for wr_id 38996224 opcode 0 qp_idx 0
What OS are you using? I've seen this error and many other Infiniban
Bogdan Costescu wrote:
Brett Pemberton wrote:
[[1176,1],0][btl_openib_component.c:2905:handle_wc] from
tango092.vpac.org to: tango090 error polling LP CQ with status RETRY
EXCEEDED ERROR status number 12 for wr_id 38996224 opcode 0 qp_idx 0
I've seen this error with Mellanox ConnectX cards
Brett Pemberton wrote:
[[1176,1],0][btl_openib_component.c:2905:handle_wc] from
tango092.vpac.org to: tango090 error polling LP CQ with status RETRY
EXCEEDED ERROR status number 12 for wr_id 38996224 opcode 0 qp_idx 0
I've seen this error with Mellanox ConnectX cards and OFED 1.2.x with
al
Hey,
I've had a couple of errors recently, of the form:
[[1176,1],0][btl_openib_component.c:2905:handle_wc] from
tango092.vpac.org to: tango090 error polling LP CQ with status RETRY
EXCEEDED ERROR status number 12 for wr_id 38996224 opcode 0 qp_idx 0
--