The bad:
OpenIB frequently crashes with the error:
***************
[0,1,2][btl_openib_endpoint.c:
135:mca_btl_openib_endpoint_post_send] error posting send request
errno says Operation now in progress[0,1,2d
[0,1,3][btl_openib_endpoint.c:
135:mca_btl_openib_endpoint_post_send] error posting send request
errno says Operation now in progress
[0,1,3][btl_openib_component.c:
655:mca_btl_openib_component_progress] error in posting pending send
[0,1,2][btl_openib_endpoint.c:
135:mca_btl_openib_endpoint_post_send] error posting send request
errno says Operation now in progress
[0,1,2][btl_openib_component.c:
655:mca_btl_openib_component_progress] error in posting pending send
***************
Hey Troy,
I made a very small change in the trunk, here is a patch.
I was unable to test this so if you could test this and report back
the results that would be great, also, instructions on reproducing
this would be great.
Thanks,
Galen
Index: btl_openib_endpoint.c
===================================================================
--- btl_openib_endpoint.c (revision 8126)
+++ btl_openib_endpoint.c (revision 8127)
@@ -74,6 +74,7 @@
struct ibv_qp* ib_qp;
struct ibv_send_wr* bad_wr;
frag->sg_entry.addr = (uintptr_t) frag->hdr;
+ frag->wr_desc.sr_desc.opcode = IBV_WR_SEND;
if(frag->base.des_flags & MCA_BTL_DES_FLAGS_PRIORITY && frag-
>size <= openib_btl->super.btl_eager_limit){
@@ -116,8 +117,8 @@
}
}
- frag->wr_desc.sr_desc.opcode = IBV_WR_SEND;
+
frag->sg_entry.length =
frag->segment.seg_len +
((unsigned char*) frag->segment.seg_addr.pval - (unsigned
char*) frag->hdr);