I got mixed results when bringing a container that doesn't have the IB and Torque libraries compiled into the OMPI inside the container to a cluster where it does.
The short summary is that mutlinode communication seems unreliable. I can mostly get up to 8 procs, two-per-node, to run, but beyond that not. In a couple of cases, a particular node seemed able to cause a problem. I am going to try again making the configure line inside the container the same as outside, but I have to chase down the IB and Torque to do so. If you're interested in how it breaks, I can send you some more information. If there are diagnostics you would like, I can try to provide those. I will be gone starting Thu for a week. -- bennet On Fri, Feb 17, 2017 at 11:20 PM, r...@open-mpi.org <r...@open-mpi.org> wrote: > I -think- that is correct, but you may need the verbs library as well - I > honestly don’t remember if the configury checks for functions in the library > or not. If so, then you’ll need that wherever you build OMPI, but everything > else is accurate > > Good luck - and let us know how it goes! > Ralph > >> On Feb 17, 2017, at 4:34 PM, Bennet Fauber <ben...@umich.edu> wrote: >> >> Ralph. >> >> I will be building from the Master branch at github.com for testing >> purposes. We are not 'supporting' Singularity container creation, but >> we do hope to be able to offer some guidance, so I think we can >> finesse the PMIx version, yes? >> >> That is good to know about the verbs headers being the only thing >> needed; thanks for that detail. Sometimes the library also needs to >> be present. >> >> Also very good to know that the host mpirun will start processes, as >> we are using cgroups, and if the processes get started by a >> non-tm-supporting MPI, they will be outside the proper cgroup. >> >> So, just to recap, if I install from the current master at >> http://github.com/open-mpi/ompi.git on the host system and within the >> container, I copy the verbs headers into the container, then configure >> and build OMPI within the container and ignore TM support, I should be >> able to copy the container to the cluster and run it with verbs and >> the system OMPI using tm. >> >> If a user were to build without the verbs support, it would still run, >> but it would fall back to non-verbs communication, so it would just be >> commensurately slower. >> >> Let me know if I've garbled things. Otherwise, wish me luck, and have >> a good weekend! >> >> Thanks, -- bennet >> >> >> >> On Fri, Feb 17, 2017 at 7:24 PM, r...@open-mpi.org <r...@open-mpi.org> wrote: >>> The embedded Singularity support hasn’t made it into the OMPI 2.x release >>> series yet, though OMPI will still work within a Singularity container >>> anyway. >>> >>> Compatibility across the container boundary is always a problem, as your >>> examples illustrate. If the system is using one OMPI version and the >>> container is using another, then the only concern is compatibility across >>> the container boundary of the process-to-ORTE daemon communication. In the >>> OMPI 2.x series and beyond, this is done with PMIx. OMPI v2.0 is based on >>> PMIx v1.x, and so will OMPI v2.1. Thus, there is no compatibility issue >>> there. However, that statement is _not_ true for OMPI v1.10 and earlier >>> series. >>> >>> Future OMPI versions will utilize PMIx v2 and above, which include a >>> cross-version compatibility layer. Thus, you shouldn’t have any issues >>> mixing and matching OMPI versions from this regard. >>> >>> However, your second example is a perfect illustration of where >>> containerization can break down. If you build your container on a system >>> that doesn’t have (for example) tm and verbs installed on it, then those >>> OMPI components will not be built. The tm component won’t matter as the >>> system version of mpirun will be executing, and it presumably knows how to >>> interact with Torque. >>> >>> However, if you run that container on a system that has verbs, your >>> application won’t be able to utilize the verbs support because those >>> components were never compiled. Note that the converse is not true: if you >>> build your container on a system that has verbs installed, you can then run >>> it on a system that doesn’t have verbs support and those components will >>> dynamically disqualify themselves. >>> >>> Remember, you only need the verbs headers to be installed - you don’t have >>> to build on a machine that actually has a verbs-supporting NIC installed >>> (this is how the distributions get around the problem). Thus, it isn’t hard >>> to avoid this portability problem - you just need to think ahead a bit. >>> >>> HTH >>> Ralph >>> >>>> On Feb 17, 2017, at 3:49 PM, Bennet Fauber <ben...@umich.edu> wrote: >>>> >>>> I am wishing to follow the instructions on the Singularity web site, >>>> >>>> http://singularity.lbl.gov/docs-hpc >>>> >>>> to test Singularity and OMPI on our cluster. My previously normal >>>> configure for the 1.x series looked like this. >>>> >>>> ./configure --prefix=/usr/local \ >>>> --mandir=${PREFIX}/share/man \ >>>> --with-tm --with-verbs \ >>>> --disable-dlopen --enable-shared >>>> CC=gcc CXX=g++ FC=gfortran >>>> >>>> I have a couple of wonderments. >>>> >>>> First, I presume it will be best to have the same version of OMPI >>>> inside the container as out, but how sensitive will it be to minor >>>> versions? All 2.1.x version should be fine, but not mix 2.1.x outside >>>> with 2.2.x inside or vice-versa (might be backward compatible but not >>>> forward)? >>>> >>>> Second, if someone builds OMPI inside their container on an external >>>> system, without tm and verbs, then brings the container to our system, >>>> will the tm and verbs be handled by the calling mpirun from the host >>>> system, and the OMPI inside the container won't care? Will not having >>>> those inside the container cause them to be suppressed outside? >>>> >>>> Thanks in advance, -- bennet >>>> _______________________________________________ >>>> users mailing list >>>> users@lists.open-mpi.org >>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>> >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users