The bad:
OpenIB frequently crashes with the error:
***************
[0,1,2][btl_openib_endpoint.c: 135:mca_btl_openib_endpoint_post_send] error posting send request errno says Operation now in progress[0,1,2d [0,1,3][btl_openib_endpoint.c: 135:mca_btl_openib_endpoint_post_send] error posting send request errno says Operation now in progress [0,1,3][btl_openib_component.c: 655:mca_btl_openib_component_progress] error in posting pending send [0,1,2][btl_openib_endpoint.c: 135:mca_btl_openib_endpoint_post_send] error posting send request errno says Operation now in progress [0,1,2][btl_openib_component.c: 655:mca_btl_openib_component_progress] error in posting pending send
***************

Hey Troy,

I made a very small change in the trunk, here is a patch.

I was unable to test this so if you could test this and report back the results that would be great, also, instructions on reproducing this would be great.

Thanks,

Galen







Index: btl_openib_endpoint.c
===================================================================
--- btl_openib_endpoint.c       (revision 8126)
+++ btl_openib_endpoint.c       (revision 8127)
@@ -74,6 +74,7 @@
     struct ibv_qp* ib_qp;
     struct ibv_send_wr* bad_wr;
     frag->sg_entry.addr = (uintptr_t) frag->hdr;
+    frag->wr_desc.sr_desc.opcode = IBV_WR_SEND;

if(frag->base.des_flags & MCA_BTL_DES_FLAGS_PRIORITY && frag- >size <= openib_btl->super.btl_eager_limit){

@@ -116,8 +117,8 @@
         }
     }

-    frag->wr_desc.sr_desc.opcode = IBV_WR_SEND;

+
     frag->sg_entry.length =
         frag->segment.seg_len +
((unsigned char*) frag->segment.seg_addr.pval - (unsigned char*) frag->hdr);

Reply via email to