Hi Jeff, I think you are now in the “send the system admin an email to install RPMs, in particular ask that the numa and udev devel rpms be installed”. They will need to install these rpms on the compute node image(s) as well.
Howard From: "Jeffrey D. (JD) Tamucci" <jeffrey.tamu...@uconn.edu> Date: Wednesday, October 5, 2022 at 9:20 AM To: "Pritchard Jr., Howard" <howa...@lanl.gov> Cc: "bbarr...@amazon.com" <bbarr...@amazon.com>, Open MPI Users <users@lists.open-mpi.org> Subject: Re: [EXTERNAL] [OMPI users] Beginner Troubleshooting OpenMPI Installation - pmi.h Error Gladly, I tried it that way and it worked in that it was able to find pmi.h. Unfortunately there's a new error about finding lnuma and ludev. make[2]: Entering directory '/shared/maylab/src/openmpi-4.1.4/opal' CCLD libopen-pal.la<https://urldefense.com/v3/__http:/libopen-pal.la__;!!Bt8fGhp8LhKGRg!BWR7snajnpicZF4YgkUocF-Zm3n1tT0PSpwsOGfvHrB1qcFmYIq9xU56yhcjTEBv6oq1Z5meQDixEwQJWs4fc6wp3HEFfA$> /usr/bin/ld: cannot find -lnuma /usr/bin/ld: cannot find -ludev collect2: error: ld returned 1 exit status make[2]: *** [Makefile:2249: libopen-pal.la<https://urldefense.com/v3/__http:/libopen-pal.la__;!!Bt8fGhp8LhKGRg!BWR7snajnpicZF4YgkUocF-Zm3n1tT0PSpwsOGfvHrB1qcFmYIq9xU56yhcjTEBv6oq1Z5meQDixEwQJWs4fc6wp3HEFfA$>] Error 1 make[2]: Leaving directory '/shared/maylab/src/openmpi-4.1.4/opal' make[1]: *** [Makefile:2394: install-recursive] Error 1 make[1]: Leaving directory '/shared/maylab/src/openmpi-4.1.4/opal' make: *** [Makefile:1912: install-recursive] Error 1 Here is a dropbox link to the full output: https://www.dropbox.com/s/4rv8n2yp320ix08/ompi-output_Oct4_2022.tar.bz2?dl=0<https://urldefense.com/v3/__https:/www.dropbox.com/s/4rv8n2yp320ix08/ompi-output_Oct4_2022.tar.bz2?dl=0__;!!Bt8fGhp8LhKGRg!BWR7snajnpicZF4YgkUocF-Zm3n1tT0PSpwsOGfvHrB1qcFmYIq9xU56yhcjTEBv6oq1Z5meQDixEwQJWs4fc6y8gBZt9g$> Thank you for your help! JD Jeffrey D. (JD) Tamucci University of Connecticut Molecular & Cell Biology RA in Lab of Eric R. May PhD / MPH Candidate he/him On Tue, Oct 4, 2022 at 1:51 PM Pritchard Jr., Howard <howa...@lanl.gov<mailto:howa...@lanl.gov>> wrote: *Message sent from a system outside of UConn.* Could you change the –with-pmi to be --with-pmi=/cm/shared/apps/slurm21.08.8 ? From: "Jeffrey D. (JD) Tamucci" <jeffrey.tamu...@uconn.edu<mailto:jeffrey.tamu...@uconn.edu>> Date: Tuesday, October 4, 2022 at 10:40 AM To: "Pritchard Jr., Howard" <howa...@lanl.gov<mailto:howa...@lanl.gov>>, "bbarr...@amazon.com<mailto:bbarr...@amazon.com>" <bbarr...@amazon.com<mailto:bbarr...@amazon.com>> Cc: Open MPI Users <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> Subject: Re: [EXTERNAL] [OMPI users] Beginner Troubleshooting OpenMPI Installation - pmi.h Error Hi Howard and Brian, Of course. Here's a dropbox link to the full folder: https://www.dropbox.com/s/raqlcnpgk9wz78b/ompi-output_Sep30_2022.tar.bz2?dl=0<https://urldefense.com/v3/__https:/www.dropbox.com/s/raqlcnpgk9wz78b/ompi-output_Sep30_2022.tar.bz2?dl=0__;!!Bt8fGhp8LhKGRg!Gbf2ik51d_yyLNSd0MxiRpzUUleMIUbnc_K_GZiX3bNyn_5hxYeebIpaGygYEZebCOMxxbVZugqOTreswGqTKVLD8RFMow$> This was the configure and make commands: ./configure \ --prefix=/shared/maylab/mayapps/mpi/openmpi/4.1.4 \ --with-slurm \ --with-lsf=no \ --with-pmi=/cm/shared/apps/slurm/21.08.8/include/slurm \ --with-pmi-libdir=/cm/shared/apps/slurm/21.08.8/lib64 \ --with-hwloc=/cm/shared/apps/hwloc/1.11.11 \ --with-cuda=/gpfs/sharedfs1/admin/hpc2.0/apps/cuda/11.6 \ --enable-shared \ --enable-static && make -j 32 && make -j 32 check make install The output of the make command is in the install_open-mpi_4.1.4_hpc2.log file. Jeffrey D. (JD) Tamucci University of Connecticut Molecular & Cell Biology RA in Lab of Eric R. May PhD / MPH Candidate he/him On Tue, Oct 4, 2022 at 12:33 PM Pritchard Jr., Howard <howa...@lanl.gov<mailto:howa...@lanl.gov>> wrote: *Message sent from a system outside of UConn.* HI JD, Could you post the configure options your script uses to build Open MPI? Howard From: users <users-boun...@lists.open-mpi.org<mailto:users-boun...@lists.open-mpi.org>> on behalf of "Jeffrey D. (JD) Tamucci via users" <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> Reply-To: Open MPI Users <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> Date: Tuesday, October 4, 2022 at 10:07 AM To: "users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>" <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> Cc: "Jeffrey D. (JD) Tamucci" <jeffrey.tamu...@uconn.edu<mailto:jeffrey.tamu...@uconn.edu>> Subject: [EXTERNAL] [OMPI users] Beginner Troubleshooting OpenMPI Installation - pmi.h Error Hi, I have been trying to install OpenMPI v4.1.4 on a university HPC cluster. We use the Bright cluster manager and have SLURM v21.08.8 and RHEL 8.6. I used a script to install OpenMPI that a former co-worker had used to successfully install OpenMPI v3.0.0 previously. I updated it to include new versions of the dependencies and new paths to those installs. Each time, it fails in the make install step. There is a fatal error about finding pmi.h. It specifically says: make[2]: Entering directory '/shared/maylab/src/openmpi-4.1.4/opal/mca/pmix/s1' CC libmca_pmix_s1_la-pmix_s1_component.lo CC libmca_pmix_s1_la-pmix_s1.lo pmix_s1.c:29:10: fatal error: pmi.h: No such file or directory 29 | #include <pmi.h> I've looked through the archives and seen others face similar errors in years past but I couldn't understand the solutions. One person suggested that SLURM may be missing PMI libraries. I think I've verified that SLURM has PMI. I include paths to those files and it seems to find them earlier in the process. I'm not sure what the next step is in troubleshooting this. I have included a bz2 file containing my install script, a log file containing the script output (from build, make, make install), the config.log, and the opal_config.h file. If anyone could provide any guidance, I'd sincerely appreciate it. Best, JD