On Wed, 2006-09-06 at 10:40 -0700, Tom Rosmond wrote: > Brian, > > I notice in the OMPI_INFO output the following parameters that seem > relevant to this problem: > > MCA btl: parameter "btl_self_free_list_num" (current > value: "0") > MCA btl: parameter "btl_self_free_list_max" (current > value: "-1") > MCA btl: parameter "btl_self_free_list_inc" (current > value: "32") > MCA btl: parameter "btl_self_eager_limit" (current > value: "131072") > MCA btl: parameter "btl_self_max_send_size" (current > value: "262144") > MCA btl: parameter "btl_self_max_rdma_size" (current > value: "2147483647") > MCA btl: parameter "btl_self_exclusivity" (current > value: "65536") > MCA btl: parameter "btl_self_flags" (current value: > "2") > MCA btl: parameter "btl_self_priority" (current > value: "0") > > Specifically the 'self_max_send_size=262144', which I assume is the > maximum size (bytes?) message a processor can send to itself. None of > the messages in my above tests approached this limit. However, I am > puzzled by this, because the program below runs correctly for > ridiculously large message sizes (as shown 200 Mbytes).
The self_max_send_size is the maximum size of a fragment that can be sent with that btl. The upper layer (the PML for point-to-point or the one-sided component) is responsible for fragmenting the message into small enough chunks. There are actually a couple of papers on our web site about how we do this (and even a bit of why we do it). I'm pretty sure this isn't the problem -- I think the one-sided implementation violating an assumption of the point-to-point semantics internally, which is causing the badness. Brian