If you can send us some more info on how it breaks, that would be helpful. I’ll 
file it as an issue so we can track things

Thanks
Ralph


> On Feb 20, 2017, at 9:13 AM, Bennet Fauber <ben...@umich.edu> wrote:
> 
> I got mixed results when bringing a container that doesn't have the IB
> and Torque libraries compiled into the OMPI inside the container to a
> cluster where it does.
> 
> The short summary is that mutlinode communication seems unreliable.  I
> can mostly get up to 8 procs, two-per-node, to run, but beyond that
> not.  In a couple of cases, a particular node seemed able to cause a
> problem.  I am going to try again making the configure line inside the
> container the same as outside, but I have to chase down the IB and
> Torque to do so.
> 
> If you're interested in how it breaks, I can send you some more
> information.  If there are diagnostics you would like, I can try to
> provide those.  I will be gone starting Thu for a week.
> 
> -- bennet
> 
> 
> 
> 
> On Fri, Feb 17, 2017 at 11:20 PM, r...@open-mpi.org <r...@open-mpi.org> wrote:
>> I -think- that is correct, but you may need the verbs library as well - I 
>> honestly don’t remember if the configury checks for functions in the library 
>> or not. If so, then you’ll need that wherever you build OMPI, but everything 
>> else is accurate
>> 
>> Good luck - and let us know how it goes!
>> Ralph
>> 
>>> On Feb 17, 2017, at 4:34 PM, Bennet Fauber <ben...@umich.edu> wrote:
>>> 
>>> Ralph.
>>> 
>>> I will be building from the Master branch at github.com for testing
>>> purposes.  We are not 'supporting' Singularity container creation, but
>>> we do hope to be able to offer some guidance, so I think we can
>>> finesse the PMIx version, yes?
>>> 
>>> That is good to know about the verbs headers being the only thing
>>> needed; thanks for that detail.  Sometimes the library also needs to
>>> be present.
>>> 
>>> Also very good to know that the host mpirun will start processes, as
>>> we are using cgroups, and if the processes get started by a
>>> non-tm-supporting MPI, they will be outside the proper cgroup.
>>> 
>>> So, just to recap, if I install from the current master at
>>> http://github.com/open-mpi/ompi.git on the host system and within the
>>> container, I copy the verbs headers into the container, then configure
>>> and build OMPI within the container and ignore TM support, I should be
>>> able to copy the container to the cluster and run it with verbs and
>>> the system OMPI using tm.
>>> 
>>> If a user were to build without the verbs support, it would still run,
>>> but it would fall back to non-verbs communication, so it would just be
>>> commensurately slower.
>>> 
>>> Let me know if I've garbled things.  Otherwise, wish me luck, and have
>>> a good weekend!
>>> 
>>> Thanks,  -- bennet
>>> 
>>> 
>>> 
>>> On Fri, Feb 17, 2017 at 7:24 PM, r...@open-mpi.org <r...@open-mpi.org> 
>>> wrote:
>>>> The embedded Singularity support hasn’t made it into the OMPI 2.x release 
>>>> series yet, though OMPI will still work within a Singularity container 
>>>> anyway.
>>>> 
>>>> Compatibility across the container boundary is always a problem, as your 
>>>> examples illustrate. If the system is using one OMPI version and the 
>>>> container is using another, then the only concern is compatibility across 
>>>> the container boundary of the process-to-ORTE daemon communication. In the 
>>>> OMPI 2.x series and beyond, this is done with PMIx. OMPI v2.0 is based on 
>>>> PMIx v1.x, and so will OMPI v2.1. Thus, there is no compatibility issue 
>>>> there. However, that statement is _not_ true for OMPI v1.10 and earlier 
>>>> series.
>>>> 
>>>> Future OMPI versions will utilize PMIx v2 and above, which include a 
>>>> cross-version compatibility layer. Thus, you shouldn’t have any issues 
>>>> mixing and matching OMPI versions from this regard.
>>>> 
>>>> However, your second example is a perfect illustration of where 
>>>> containerization can break down. If you build your container on a system 
>>>> that doesn’t have (for example) tm and verbs installed on it, then those 
>>>> OMPI components will not be built. The tm component won’t matter as the 
>>>> system version of mpirun will be executing, and it presumably knows how to 
>>>> interact with Torque.
>>>> 
>>>> However, if you run that container on a system that has verbs, your 
>>>> application won’t be able to utilize the verbs support because those 
>>>> components were never compiled. Note that the converse is not true: if you 
>>>> build your container on a system that has verbs installed, you can then 
>>>> run it on a system that doesn’t have verbs support and those components 
>>>> will dynamically disqualify themselves.
>>>> 
>>>> Remember, you only need the verbs headers to be installed - you don’t have 
>>>> to build on a machine that actually has a verbs-supporting NIC installed 
>>>> (this is how the distributions get around the problem). Thus, it isn’t 
>>>> hard to avoid this portability problem - you just need to think ahead a 
>>>> bit.
>>>> 
>>>> HTH
>>>> Ralph
>>>> 
>>>>> On Feb 17, 2017, at 3:49 PM, Bennet Fauber <ben...@umich.edu> wrote:
>>>>> 
>>>>> I am wishing to follow the instructions on the Singularity web site,
>>>>> 
>>>>>  http://singularity.lbl.gov/docs-hpc
>>>>> 
>>>>> to test Singularity and OMPI on our cluster.  My previously normal
>>>>> configure for the 1.x series looked like this.
>>>>> 
>>>>> ./configure --prefix=/usr/local \
>>>>> --mandir=${PREFIX}/share/man \
>>>>> --with-tm --with-verbs \
>>>>> --disable-dlopen --enable-shared
>>>>> CC=gcc CXX=g++ FC=gfortran
>>>>> 
>>>>> I have a couple of wonderments.
>>>>> 
>>>>> First, I presume it will be best to have the same version of OMPI
>>>>> inside the container as out, but how sensitive will it be to minor
>>>>> versions?  All 2.1.x version should be fine, but not mix 2.1.x outside
>>>>> with 2.2.x inside or vice-versa (might be backward compatible but not
>>>>> forward)?
>>>>> 
>>>>> Second, if someone builds OMPI inside their container on an external
>>>>> system, without tm and verbs, then brings the container to our system,
>>>>> will the tm and verbs be handled by the calling mpirun from the host
>>>>> system, and the OMPI inside the container won't care?  Will not having
>>>>> those inside the container cause them to be suppressed outside?
>>>>> 
>>>>> Thanks in advance,  -- bennet
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users@lists.open-mpi.org
>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> users@lists.open-mpi.org
>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>> 
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to