Here is the output of the "ompi_info --param btl openib":
MCA btl: parameter "btl_openib_flags" (current value: <306>,
data
source: default value)
BTL bit flags (general flags: SEND=1, PUT=2, GET=4,
SEND_INPLACE=8, RDMA_MATCHED=64,
HETEROGENEOUS_RDMA=256; flags
only used by the "dr" PML (ignored by others): ACK=16,
CHECKSUM=32, RDMA_COMPLETION=128; flags only used by
the "bfo"
PML (ignored by others): FAILOVER_SUPPORT=512)
So the 305 flags means: HETEROGENEOUS_RDMA | CHECKSUM | ACK | SEND. Most of
these flags are totally useless in the current version of Open MPI (DR is not
supported), so the only value that really matter is SEND | HETEROGENEOUS_RDMA.
If you want to enable the send protocol try first with SEND | SEND_INPLACE (9),
if not downgrade to SEND (1)
george.
On May 16, 2011, at 11:33 , Samuel K. Gutierrez wrote:
>
> On May 16, 2011, at 8:53 AM, Brock Palen wrote:
>
>>
>>
>>
>> On May 16, 2011, at 10:23 AM, Samuel K. Gutierrez wrote:
>>
>>> Hi,
>>>
>>> Just out of curiosity - what happens when you add the following MCA option
>>> to your openib runs?
>>>
>>> -mca btl_openib_flags 305
>>
>> You Sir found the magic combination.
>
> :-) - cool.
>
> Developers - does this smell like a registered memory availability hang?
>
>> I verified this lets IMB and CRASH progress pass their lockup points,
>> I will have a user test this,
>
> Please let us know what you find.
>
>> Is this an ok option to put in our environment? What does 305 mean?
>
> There may be a performance hit associated with this configuration, but if it
> lets your users run, then I don't see a problem with adding it to your
> environment.
>
> If I'm reading things correctly, 305 turns off RDMA PUT/GET and turns on SEND.
>
> OpenFabrics gurus - please correct me if I'm wrong :-).
>
> Samuel Gutierrez
> Los Alamos National Laboratory
>
>
>>
>>
>> Brock Palen
>> www.umich.edu/~brockp
>> Center for Advanced Computing
>> [email protected]
>> (734)936-1985
>>
>>>
>>> Thanks,
>>>
>>> Samuel Gutierrez
>>> Los Alamos National Laboratory
>>>
>>> On May 13, 2011, at 2:38 PM, Brock Palen wrote:
>>>
>>>> On May 13, 2011, at 4:09 PM, Dave Love wrote:
>>>>
>>>>> Jeff Squyres <[email protected]> writes:
>>>>>
>>>>>> On May 11, 2011, at 3:21 PM, Dave Love wrote:
>>>>>>
>>>>>>> We can reproduce it with IMB. We could provide access, but we'd have to
>>>>>>> negotiate with the owners of the relevant nodes to give you interactive
>>>>>>> access to them. Maybe Brock's would be more accessible? (If you
>>>>>>> contact me, I may not be able to respond for a few days.)
>>>>>>
>>>>>> Brock has replied off-list that he, too, is able to reliably reproduce
>>>>>> the issue with IMB, and is working to get access for us. Many thanks
>>>>>> for your offer; let's see where Brock's access takes us.
>>>>>
>>>>> Good. Let me know if we could be useful
>>>>>
>>>>>>>> -- we have not closed this issue,
>>>>>>>
>>>>>>> Which issue? I couldn't find a relevant-looking one.
>>>>>>
>>>>>> https://svn.open-mpi.org/trac/ompi/ticket/2714
>>>>>
>>>>> Thanks. In csse it's useful info, it hangs for me with 1.5.3 & np=32 on
>>>>> connectx with more than one collective I can't recall.
>>>>
>>>> Extra data point, that ticket said it ran with mpi_preconnect_mpi 1, well
>>>> that doesn't help here, both my production code (crash) and IMB still hang.
>>>>
>>>>
>>>> Brock Palen
>>>> www.umich.edu/~brockp
>>>> Center for Advanced Computing
>>>> [email protected]
>>>> (734)936-1985
>>>>
>>>>>
>>>>> --
>>>>> Excuse the typping -- I have a broken wrist
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> [email protected]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> [email protected]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> [email protected]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>
>>
>> _______________________________________________
>> users mailing list
>> [email protected]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> [email protected]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
George Bosilca
Research Assistant Professor
Innovative Computing Laboratory
Department of Electrical Engineering and Computer Science
University of Tennessee, Knoxville
http://web.eecs.utk.edu/~bosilca/