I missed this step then to build pmix separately. I thought that the built in pmix inside openpmi could be used by slurm
> > On Mar 14, 2019 at 9:32 PM, <Gilles Gouaillardet (mailto:gil...@rist.or.jp)> > wrote: > > > > Riccardo, > > > I am a bit confused by your explanation. > > > Open MPI does embed PMIx, but only for itself. > > An other way to put it is you have to install pmix first (package or > download from pmix.org) > > and then build SLURM on top of it. > > > Then you can build Open MPI with the same (external) PMIx or the > embedded one > > (since PMIx offers cross-version compatilibity support) > > > Cheers, > > > Gilles > > > On 3/15/2019 12:24 PM, Riccardo Veraldi wrote: > > thanks to all. > > the problem is that slurm's configure is not able to find the pmix > > includes > > > > configure:20846: checking for pmix installation > > configure:21005: result: > > configure:21021: WARNING: unable to locate pmix installation > > > > regardless of the path I give. > > and the reason is that configure searches for the following includes: > > > > test -f "$d/include/pmix/pmix_common.h" > > test -f "$d/include/pmix_server.h" > > > > but neither of the two are installed by openmpi. > > > > one of the two is in the openmpi soure code tarball > > > > ./opal/mca/pmix/pmix3x/pmix/include/pmix_server.h > > > > the other one is in a ".h.in" file. and not ".h" > > > > ./opal/mca/pmix/pmix3x/pmix/include/pmix_common.h.in > > > > anyway they do not get installed by the rpm. > > > > the last thing I can try is build directly openmpi from sources and > > give up with the rpm package build. The openmpi .spec has also errors > > which I had to fix manually to allow it to successfully build > > > > > > > > On 3/12/19 4:56 PM, Daniel Letai wrote: > >> Hi. > >> On 12/03/2019 22:53:36, Riccardo Veraldi wrote: > >>> Hello, > >>> after trynig hard for over 10 days I am forced to write to the list. > >>> I am not able to have SLURM work with openmpi. Openmpi compiled > >>> binaries won't run on slurm, while all non openmpi progs run just > >>> fine under "srun". I am using SLURM 18.08.5 building the rpm from > >>> the tarball: rpmbuild -ta slurm-18.08.5-2.tar.bz2 > >>> prior to bulid SLURM I installed openmpi 4.0.0 which has built in > >>> pmix support. the pmix libraries are in /usr/lib64/pmix/ which is > >>> the default installation path. > >>> > >>> The problem is that hellompi is not working if I launch in from > >>> srun. of course it runs outside slurm. > >>> > >>> [psanagpu105:10995] OPAL ERROR: Not initialized in file > >>> pmix3x_client.c at line 113 > >>> > >>> -------------------------------------------------------------------------- > >>> > >>> The application appears to have been direct launched using "srun", > >>> but OMPI was not built with SLURM's PMI support and therefore cannot > >>> execute. There are several options for building PMI support under > >> > >> I would guess (but having the config.log files would verify it) that > >> you should rebuild Slurm --with-pmix and then you should rebuild > >> OpenMPI --with Slurm. > >> > >> Currently there might be a bug in Slurm's configure file building > >> PMIx support without path, so you might either modify the spec before > >> building (add --with-pmix=/usr to the configure section) or for > >> testing purposes ./configure --with-pmix=/usr; make; make install. > >> > >> > >> It seems your current configuration has built-in mismatch - Slurm > >> only supports pmi2, while OpenMPI only supports PMIx. you should > >> build with at least one common PMI: either external PMIx when > >> building Slurm, or Slurm's PMI2 when building OpenMPI. > >> > >> However, I would have expected the non-PMI option (srun > >> --mpi=openmpi) to work even in your env, and Slurm should have built > >> PMIx support automatically since it's in default search path. > >> > >> > >>> SLURM, depending upon the SLURM version you are using: > >>> > >>> version 16.05 or later: you can use SLURM's PMIx support. This > >>> requires that you configure and build SLURM --with-pmix. > >>> > >>> Versions earlier than 16.05: you must use either SLURM's PMI-1 or > >>> PMI-2 support. SLURM builds PMI-1 by default, or you can manually > >>> install PMI-2. You must then build Open MPI using --with-pmi pointing > >>> to the SLURM PMI library location. > >>> > >>> Please configure as appropriate and try again. > >>> > >>> -------------------------------------------------------------------------- > >>> > >>> *** An error occurred in MPI_Init > >>> *** on a NULL communicator > >>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, > >>> *** and potentially your MPI job) > >>> [psanagpu105:10995] Local abort before MPI_INIT completed completed > >>> successfully, but am not able to aggregate error messages, and not > >>> able to guarantee that all other processes were killed! > >>> srun: error: psanagpu105: task 0: Exited with exit code 1 > >>> > >>> I really have no clue. I even reinstalled openmpi on a specific > >>> different path /opt/openmpi/4.0.0 > >>> anyway seems like slurm does not know how to fine the MPI libraries > >>> even though they are there and right now in the default path /usr/lib64 > >>> > >>> even using --mpi=pmi2 or --mpi=openmpi does not fix the problem and > >>> the same error message is given to me. > >>> srun --mpi=list > >>> srun: MPI types are... > >>> srun: none > >>> srun: openmpi > >>> srun: pmi2 > >>> > >>> > >>> Any hint how could I fix this problem ? > >>> thanks a lot > >>> > >>> Rick > >>> > >>> > >> -- > >> Regards, > >> > >> Dani_L. > > > > > >