I got mixed results when bringing a container that doesn't have the IB
and Torque libraries compiled into the OMPI inside the container to a
cluster where it does.

The short summary is that mutlinode communication seems unreliable.  I
can mostly get up to 8 procs, two-per-node, to run, but beyond that
not.  In a couple of cases, a particular node seemed able to cause a
problem.  I am going to try again making the configure line inside the
container the same as outside, but I have to chase down the IB and
Torque to do so.

If you're interested in how it breaks, I can send you some more
information.  If there are diagnostics you would like, I can try to
provide those.  I will be gone starting Thu for a week.

-- bennet




On Fri, Feb 17, 2017 at 11:20 PM, r...@open-mpi.org <r...@open-mpi.org> wrote:
> I -think- that is correct, but you may need the verbs library as well - I 
> honestly don’t remember if the configury checks for functions in the library 
> or not. If so, then you’ll need that wherever you build OMPI, but everything 
> else is accurate
>
> Good luck - and let us know how it goes!
> Ralph
>
>> On Feb 17, 2017, at 4:34 PM, Bennet Fauber <ben...@umich.edu> wrote:
>>
>> Ralph.
>>
>> I will be building from the Master branch at github.com for testing
>> purposes.  We are not 'supporting' Singularity container creation, but
>> we do hope to be able to offer some guidance, so I think we can
>> finesse the PMIx version, yes?
>>
>> That is good to know about the verbs headers being the only thing
>> needed; thanks for that detail.  Sometimes the library also needs to
>> be present.
>>
>> Also very good to know that the host mpirun will start processes, as
>> we are using cgroups, and if the processes get started by a
>> non-tm-supporting MPI, they will be outside the proper cgroup.
>>
>> So, just to recap, if I install from the current master at
>> http://github.com/open-mpi/ompi.git on the host system and within the
>> container, I copy the verbs headers into the container, then configure
>> and build OMPI within the container and ignore TM support, I should be
>> able to copy the container to the cluster and run it with verbs and
>> the system OMPI using tm.
>>
>> If a user were to build without the verbs support, it would still run,
>> but it would fall back to non-verbs communication, so it would just be
>> commensurately slower.
>>
>> Let me know if I've garbled things.  Otherwise, wish me luck, and have
>> a good weekend!
>>
>> Thanks,  -- bennet
>>
>>
>>
>> On Fri, Feb 17, 2017 at 7:24 PM, r...@open-mpi.org <r...@open-mpi.org> wrote:
>>> The embedded Singularity support hasn’t made it into the OMPI 2.x release 
>>> series yet, though OMPI will still work within a Singularity container 
>>> anyway.
>>>
>>> Compatibility across the container boundary is always a problem, as your 
>>> examples illustrate. If the system is using one OMPI version and the 
>>> container is using another, then the only concern is compatibility across 
>>> the container boundary of the process-to-ORTE daemon communication. In the 
>>> OMPI 2.x series and beyond, this is done with PMIx. OMPI v2.0 is based on 
>>> PMIx v1.x, and so will OMPI v2.1. Thus, there is no compatibility issue 
>>> there. However, that statement is _not_ true for OMPI v1.10 and earlier 
>>> series.
>>>
>>> Future OMPI versions will utilize PMIx v2 and above, which include a 
>>> cross-version compatibility layer. Thus, you shouldn’t have any issues 
>>> mixing and matching OMPI versions from this regard.
>>>
>>> However, your second example is a perfect illustration of where 
>>> containerization can break down. If you build your container on a system 
>>> that doesn’t have (for example) tm and verbs installed on it, then those 
>>> OMPI components will not be built. The tm component won’t matter as the 
>>> system version of mpirun will be executing, and it presumably knows how to 
>>> interact with Torque.
>>>
>>> However, if you run that container on a system that has verbs, your 
>>> application won’t be able to utilize the verbs support because those 
>>> components were never compiled. Note that the converse is not true: if you 
>>> build your container on a system that has verbs installed, you can then run 
>>> it on a system that doesn’t have verbs support and those components will 
>>> dynamically disqualify themselves.
>>>
>>> Remember, you only need the verbs headers to be installed - you don’t have 
>>> to build on a machine that actually has a verbs-supporting NIC installed 
>>> (this is how the distributions get around the problem). Thus, it isn’t hard 
>>> to avoid this portability problem - you just need to think ahead a bit.
>>>
>>> HTH
>>> Ralph
>>>
>>>> On Feb 17, 2017, at 3:49 PM, Bennet Fauber <ben...@umich.edu> wrote:
>>>>
>>>> I am wishing to follow the instructions on the Singularity web site,
>>>>
>>>>   http://singularity.lbl.gov/docs-hpc
>>>>
>>>> to test Singularity and OMPI on our cluster.  My previously normal
>>>> configure for the 1.x series looked like this.
>>>>
>>>> ./configure --prefix=/usr/local \
>>>>  --mandir=${PREFIX}/share/man \
>>>>  --with-tm --with-verbs \
>>>>  --disable-dlopen --enable-shared
>>>>  CC=gcc CXX=g++ FC=gfortran
>>>>
>>>> I have a couple of wonderments.
>>>>
>>>> First, I presume it will be best to have the same version of OMPI
>>>> inside the container as out, but how sensitive will it be to minor
>>>> versions?  All 2.1.x version should be fine, but not mix 2.1.x outside
>>>> with 2.2.x inside or vice-versa (might be backward compatible but not
>>>> forward)?
>>>>
>>>> Second, if someone builds OMPI inside their container on an external
>>>> system, without tm and verbs, then brings the container to our system,
>>>> will the tm and verbs be handled by the calling mpirun from the host
>>>> system, and the OMPI inside the container won't care?  Will not having
>>>> those inside the container cause them to be suppressed outside?
>>>>
>>>> Thanks in advance,  -- bennet
>>>> _______________________________________________
>>>> users mailing list
>>>> users@lists.open-mpi.org
>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to