Re: [OMPI users] OPENSHMEM ERROR with 2+ Distributed Machines

2016-08-17 Thread Jeff Squyres (jsquyres)
Debendra -

A fix has been submitted for the v2.0.1 release.  Could you give it a try with 
the latest snapshot (anything dated on or after Aug 17):

   https://www.open-mpi.org/nightly/v2.x/


> On Aug 16, 2016, at 6:21 AM, Gilles Gouaillardet 
>  wrote:
> 
> assuming you have an infiniband network, an other option is to install mxm 
> (mellanox proprietary but free library) and rebuild Open MPI.
> pml/yalla will be used instead of ob1 and you should be just fine
> 
> Cheers,
> 
> Gilles
> 
> On Tuesday, August 16, 2016, Jeff Squyres (jsquyres)  
> wrote:
> On Aug 16, 2016, at 6:09 AM, Debendra Das  wrote:
> >
> > As far as I understood I have to wait for version 2.0.1 to fix the issue.So 
> > can you please give any idea about when 2.0.1 will be released.
> 
> We had hoped to release it today, actually.  :-\  But there's still a few 
> issues we're working out (including this one).
> 
> > Also I could not understand how to use the patch.
> 
> I think that's ok because we don't have agreement that that patch is the 
> correct fix yet, anyway.
> 
> Sorry for the delay.  :-\
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] SGE integration broken in 2.0.0

2016-08-17 Thread Orion Poplawski
On 08/12/2016 02:59 PM, r...@open-mpi.org wrote:
> 
>> On Aug 12, 2016, at 1:48 PM, Reuti > > wrote:
>>
>>
>> Am 12.08.2016 um 21:44 schrieb r...@open-mpi.org :
>>
>>> Don’t know about the toolchain issue - I use those same versions, and don’t
>>> have a problem. I’m on CentOS-7, so that might be the difference?
>>>
>>> Anyway, I found the missing code to assemble the cmd line for qrsh - not
>>> sure how/why it got deleted.
>>>
>>> https://github.com/open-mpi/ompi/pull/1960
>>
>> Yep, it's working again - thx.
>>
>> But for sure there was a reason behind the removal, which may be elaborated
>> in the Open MPI team to avoid any side effects by fixing this issue.
> 
> I actually don’t recall a reason - and I’m the one that generally maintains
> that code area. I think it fell of the map accidentally when I was updating
> that area.
> 
> However, we’ll toss it out there for comment - anyone recall?

FWIW, this is the commit that removed it:
https://github.com/open-mpi/ompi/commit/0140ff048d1db39c0337162788d3811f39926d7e

Author: Ralph Castain 
Date:   Thu Sep 24 07:16:48 2015 -0700

Now that we have an "isolated" PLM component, we cannot just let rsh
silently decline to run

This required modifying the mca_component_select function to actually
check the return code

Also do a little cleanup to avoid bombarding the user with multiple error
messages.

Thanks to Patrick Begou for reporting the problem



-- 
Orion Poplawski
Technical Manager 303-415-9701 x222
NWRA, Boulder/CoRA Office FAX: 303-415-9702
3380 Mitchell Lane   or...@nwra.com
Boulder, CO 80301   http://www.nwra.com
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users