On May 16, 2011, at 8:53 AM, Brock Palen wrote: > > > > On May 16, 2011, at 10:23 AM, Samuel K. Gutierrez wrote: > >> Hi, >> >> Just out of curiosity - what happens when you add the following MCA option >> to your openib runs? >> >> -mca btl_openib_flags 305 > > You Sir found the magic combination.
:-) - cool. Developers - does this smell like a registered memory availability hang? > I verified this lets IMB and CRASH progress pass their lockup points, > I will have a user test this, Please let us know what you find. > Is this an ok option to put in our environment? What does 305 mean? There may be a performance hit associated with this configuration, but if it lets your users run, then I don't see a problem with adding it to your environment. If I'm reading things correctly, 305 turns off RDMA PUT/GET and turns on SEND. OpenFabrics gurus - please correct me if I'm wrong :-). Samuel Gutierrez Los Alamos National Laboratory > > > Brock Palen > www.umich.edu/~brockp > Center for Advanced Computing > bro...@umich.edu > (734)936-1985 > >> >> Thanks, >> >> Samuel Gutierrez >> Los Alamos National Laboratory >> >> On May 13, 2011, at 2:38 PM, Brock Palen wrote: >> >>> On May 13, 2011, at 4:09 PM, Dave Love wrote: >>> >>>> Jeff Squyres <jsquy...@cisco.com> writes: >>>> >>>>> On May 11, 2011, at 3:21 PM, Dave Love wrote: >>>>> >>>>>> We can reproduce it with IMB. We could provide access, but we'd have to >>>>>> negotiate with the owners of the relevant nodes to give you interactive >>>>>> access to them. Maybe Brock's would be more accessible? (If you >>>>>> contact me, I may not be able to respond for a few days.) >>>>> >>>>> Brock has replied off-list that he, too, is able to reliably reproduce >>>>> the issue with IMB, and is working to get access for us. Many thanks for >>>>> your offer; let's see where Brock's access takes us. >>>> >>>> Good. Let me know if we could be useful >>>> >>>>>>> -- we have not closed this issue, >>>>>> >>>>>> Which issue? I couldn't find a relevant-looking one. >>>>> >>>>> https://svn.open-mpi.org/trac/ompi/ticket/2714 >>>> >>>> Thanks. In csse it's useful info, it hangs for me with 1.5.3 & np=32 on >>>> connectx with more than one collective I can't recall. >>> >>> Extra data point, that ticket said it ran with mpi_preconnect_mpi 1, well >>> that doesn't help here, both my production code (crash) and IMB still hang. >>> >>> >>> Brock Palen >>> www.umich.edu/~brockp >>> Center for Advanced Computing >>> bro...@umich.edu >>> (734)936-1985 >>> >>>> >>>> -- >>>> Excuse the typping -- I have a broken wrist >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users