If you can send us some more info on how it breaks, that would be helpful. I’ll file it as an issue so we can track things
Thanks Ralph > On Feb 20, 2017, at 9:13 AM, Bennet Fauber <ben...@umich.edu> wrote: > > I got mixed results when bringing a container that doesn't have the IB > and Torque libraries compiled into the OMPI inside the container to a > cluster where it does. > > The short summary is that mutlinode communication seems unreliable. I > can mostly get up to 8 procs, two-per-node, to run, but beyond that > not. In a couple of cases, a particular node seemed able to cause a > problem. I am going to try again making the configure line inside the > container the same as outside, but I have to chase down the IB and > Torque to do so. > > If you're interested in how it breaks, I can send you some more > information. If there are diagnostics you would like, I can try to > provide those. I will be gone starting Thu for a week. > > -- bennet > > > > > On Fri, Feb 17, 2017 at 11:20 PM, r...@open-mpi.org <r...@open-mpi.org> wrote: >> I -think- that is correct, but you may need the verbs library as well - I >> honestly don’t remember if the configury checks for functions in the library >> or not. If so, then you’ll need that wherever you build OMPI, but everything >> else is accurate >> >> Good luck - and let us know how it goes! >> Ralph >> >>> On Feb 17, 2017, at 4:34 PM, Bennet Fauber <ben...@umich.edu> wrote: >>> >>> Ralph. >>> >>> I will be building from the Master branch at github.com for testing >>> purposes. We are not 'supporting' Singularity container creation, but >>> we do hope to be able to offer some guidance, so I think we can >>> finesse the PMIx version, yes? >>> >>> That is good to know about the verbs headers being the only thing >>> needed; thanks for that detail. Sometimes the library also needs to >>> be present. >>> >>> Also very good to know that the host mpirun will start processes, as >>> we are using cgroups, and if the processes get started by a >>> non-tm-supporting MPI, they will be outside the proper cgroup. >>> >>> So, just to recap, if I install from the current master at >>> http://github.com/open-mpi/ompi.git on the host system and within the >>> container, I copy the verbs headers into the container, then configure >>> and build OMPI within the container and ignore TM support, I should be >>> able to copy the container to the cluster and run it with verbs and >>> the system OMPI using tm. >>> >>> If a user were to build without the verbs support, it would still run, >>> but it would fall back to non-verbs communication, so it would just be >>> commensurately slower. >>> >>> Let me know if I've garbled things. Otherwise, wish me luck, and have >>> a good weekend! >>> >>> Thanks, -- bennet >>> >>> >>> >>> On Fri, Feb 17, 2017 at 7:24 PM, r...@open-mpi.org <r...@open-mpi.org> >>> wrote: >>>> The embedded Singularity support hasn’t made it into the OMPI 2.x release >>>> series yet, though OMPI will still work within a Singularity container >>>> anyway. >>>> >>>> Compatibility across the container boundary is always a problem, as your >>>> examples illustrate. If the system is using one OMPI version and the >>>> container is using another, then the only concern is compatibility across >>>> the container boundary of the process-to-ORTE daemon communication. In the >>>> OMPI 2.x series and beyond, this is done with PMIx. OMPI v2.0 is based on >>>> PMIx v1.x, and so will OMPI v2.1. Thus, there is no compatibility issue >>>> there. However, that statement is _not_ true for OMPI v1.10 and earlier >>>> series. >>>> >>>> Future OMPI versions will utilize PMIx v2 and above, which include a >>>> cross-version compatibility layer. Thus, you shouldn’t have any issues >>>> mixing and matching OMPI versions from this regard. >>>> >>>> However, your second example is a perfect illustration of where >>>> containerization can break down. If you build your container on a system >>>> that doesn’t have (for example) tm and verbs installed on it, then those >>>> OMPI components will not be built. The tm component won’t matter as the >>>> system version of mpirun will be executing, and it presumably knows how to >>>> interact with Torque. >>>> >>>> However, if you run that container on a system that has verbs, your >>>> application won’t be able to utilize the verbs support because those >>>> components were never compiled. Note that the converse is not true: if you >>>> build your container on a system that has verbs installed, you can then >>>> run it on a system that doesn’t have verbs support and those components >>>> will dynamically disqualify themselves. >>>> >>>> Remember, you only need the verbs headers to be installed - you don’t have >>>> to build on a machine that actually has a verbs-supporting NIC installed >>>> (this is how the distributions get around the problem). Thus, it isn’t >>>> hard to avoid this portability problem - you just need to think ahead a >>>> bit. >>>> >>>> HTH >>>> Ralph >>>> >>>>> On Feb 17, 2017, at 3:49 PM, Bennet Fauber <ben...@umich.edu> wrote: >>>>> >>>>> I am wishing to follow the instructions on the Singularity web site, >>>>> >>>>> http://singularity.lbl.gov/docs-hpc >>>>> >>>>> to test Singularity and OMPI on our cluster. My previously normal >>>>> configure for the 1.x series looked like this. >>>>> >>>>> ./configure --prefix=/usr/local \ >>>>> --mandir=${PREFIX}/share/man \ >>>>> --with-tm --with-verbs \ >>>>> --disable-dlopen --enable-shared >>>>> CC=gcc CXX=g++ FC=gfortran >>>>> >>>>> I have a couple of wonderments. >>>>> >>>>> First, I presume it will be best to have the same version of OMPI >>>>> inside the container as out, but how sensitive will it be to minor >>>>> versions? All 2.1.x version should be fine, but not mix 2.1.x outside >>>>> with 2.2.x inside or vice-versa (might be backward compatible but not >>>>> forward)? >>>>> >>>>> Second, if someone builds OMPI inside their container on an external >>>>> system, without tm and verbs, then brings the container to our system, >>>>> will the tm and verbs be handled by the calling mpirun from the host >>>>> system, and the OMPI inside the container won't care? Will not having >>>>> those inside the container cause them to be suppressed outside? >>>>> >>>>> Thanks in advance, -- bennet >>>>> _______________________________________________ >>>>> users mailing list >>>>> users@lists.open-mpi.org >>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>>> >>>> _______________________________________________ >>>> users mailing list >>>> users@lists.open-mpi.org >>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >> >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users