On May 16, 2011, at 8:53 AM, Brock Palen wrote:

> 
> 
> 
> On May 16, 2011, at 10:23 AM, Samuel K. Gutierrez wrote:
> 
>> Hi,
>> 
>> Just out of curiosity - what happens when you add the following MCA option 
>> to your openib runs?
>> 
>> -mca btl_openib_flags 305
> 
> You Sir found the magic combination.

:-)  - cool.

Developers - does this smell like a registered memory availability hang?

> I verified this lets IMB and CRASH progress pass their lockup points,
> I will have a user test this, 

Please let us know what you find.

> Is this an ok option to put in our environment?  What does 305 mean?

There may be a performance hit associated with this configuration, but if it 
lets your users run, then I don't see a problem with adding it to your 
environment.

If I'm reading things correctly, 305 turns off RDMA PUT/GET and turns on SEND.

OpenFabrics gurus - please correct me if I'm wrong :-).

Samuel Gutierrez
Los Alamos National Laboratory


> 
> 
> Brock Palen
> www.umich.edu/~brockp
> Center for Advanced Computing
> bro...@umich.edu
> (734)936-1985
> 
>> 
>> Thanks,
>> 
>> Samuel Gutierrez
>> Los Alamos National Laboratory
>> 
>> On May 13, 2011, at 2:38 PM, Brock Palen wrote:
>> 
>>> On May 13, 2011, at 4:09 PM, Dave Love wrote:
>>> 
>>>> Jeff Squyres <jsquy...@cisco.com> writes:
>>>> 
>>>>> On May 11, 2011, at 3:21 PM, Dave Love wrote:
>>>>> 
>>>>>> We can reproduce it with IMB.  We could provide access, but we'd have to
>>>>>> negotiate with the owners of the relevant nodes to give you interactive
>>>>>> access to them.  Maybe Brock's would be more accessible?  (If you
>>>>>> contact me, I may not be able to respond for a few days.)
>>>>> 
>>>>> Brock has replied off-list that he, too, is able to reliably reproduce 
>>>>> the issue with IMB, and is working to get access for us.  Many thanks for 
>>>>> your offer; let's see where Brock's access takes us.
>>>> 
>>>> Good.  Let me know if we could be useful
>>>> 
>>>>>>> -- we have not closed this issue,
>>>>>> 
>>>>>> Which issue?   I couldn't find a relevant-looking one.
>>>>> 
>>>>> https://svn.open-mpi.org/trac/ompi/ticket/2714
>>>> 
>>>> Thanks.  In csse it's useful info, it hangs for me with 1.5.3 & np=32 on
>>>> connectx with more than one collective I can't recall.
>>> 
>>> Extra data point, that ticket said it ran with mpi_preconnect_mpi 1,  well 
>>> that doesn't help here, both my production code (crash) and IMB still hang.
>>> 
>>> 
>>> Brock Palen
>>> www.umich.edu/~brockp
>>> Center for Advanced Computing
>>> bro...@umich.edu
>>> (734)936-1985
>>> 
>>>> 
>>>> -- 
>>>> Excuse the typping -- I have a broken wrist
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> 
>>>> 
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to