Note that, depending on your environment, you might need to set these env variables on every node where you're running the Open MPI job. For example: https://docs.open-mpi.org/en/v5.0.x/launching-apps/quickstart.html#launching-in-a-non-scheduled-environments-via-ssh and https://docs.open-mpi.org/en/v5.0.x/launching-apps/ssh.html#finding-open-mpi-executables-and-libraries. ________________________________ From: T Brouns <t.s.n.bro...@gmail.com> Sent: Sunday, May 5, 2024 4:37 PM To: users@lists.open-mpi.org <users@lists.open-mpi.org> Cc: Jeff Squyres (jsquyres) <jsquy...@cisco.com>; hear...@gmail.com <hear...@gmail.com> Subject: Re: [OMPI users] Fwd: Unable to run basic mpirun command (OpenMPI v5.0.3)
Hi all, I solved the problem by doing: ``` INSTALL_DIR=/usr/local/openmpi-5.0.3 export PATH=$INSTALL_DIR/bin:$PATH export LD_LIBRARY_PATH=$INSTALL_DIR/lib:$LD_LIBRARY_PATH export OPAL_PREFIX=$INSTALL_DIR ``` That OPAL_PREFIX line was the tricky one. After doing that, these mpirun commands are now working correctly: ``` mpirun --version mpirun uptime ``` Thanks for pointing me in the right direction! @John Hearns, I'm not setting up a Modules environment, but this sounds like a great solution to the problem. I might need to look into that! Thanks. Best, Terence On Sat, 4 May 2024 at 17:22, Jeff Squyres (jsquyres) <jsquy...@cisco.com<mailto:jsquy...@cisco.com>> wrote: You might want to see if your OS has Open MPI installed into default binary / library search paths; you might be able to uninstall it easily. Otherwise, even if you explicitly run the mpirun you just built+installed, it might find the libmpi.so from some other copy of Open MPI. Alternatively, your could prefix your LD_LIBRARY_PATH environment variable with the libdir from the Open MPI installation you just created. ________________________________ From: T Brouns <t.s.n.bro...@gmail.com<mailto:t.s.n.bro...@gmail.com>> Sent: Saturday, May 4, 2024 10:56 AM To: Jeff Squyres (jsquyres) <jsquy...@cisco.com<mailto:jsquy...@cisco.com>>; users@lists.open-mpi.org<mailto:users@lists.open-mpi.org> <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> Subject: Re: [OMPI users] Fwd: Unable to run basic mpirun command (OpenMPI v5.0.3) Hi Jeff, I think you're onto something with the multiple copies. For this reason, I also tried to run: ``` /usr/local/openmpi-5.0.3/bin/mpirun --version ``` To make sure I'm running the correct copy, but this one crashes with the same error. As a next step, I can try to install OpenMPI on a different system to narrow down the problem. Or run it in a Docker container. And thanks for the pointer on the `mpirun hello_c.c`. This command made no sense. Best, Terence On Sat, 4 May 2024, 14:30 Jeff Squyres (jsquyres), <jsquy...@cisco.com<mailto:jsquy...@cisco.com>> wrote: My apologies – I must have somehow been looking at the wrong config.log file. I see there's an extra - in the script on the help page; I'll get that fixed. Thanks for the tarball; that's easier to get everything. Looking in there, it looks like you built with a prefix of /usr/local/openmpi-5.0.3, but your original email referred to looking for a help file in /usr/share/openmpi/help-mpirun.txt -- this seems to be a disparity. You might want to check that you don't have multiple copies of Open MPI installed, and you're not running an unexpected copy somewhere – not the one you just built. Also, your first mail mentioned "mpirun hello_c.c" – you don't want to do that. mpirun is used for launching applications. hello_c.c is the source code – you need to compile it first. In the examples directory, you can make, or you can manually build it via mpicc hello_c.c -o hello_c. ________________________________ From: T Brouns <t.s.n.bro...@gmail.com<mailto:t.s.n.bro...@gmail.com>> Sent: Saturday, May 4, 2024 2:00 AM To: Jeff Squyres (jsquyres) <jsquy...@cisco.com<mailto:jsquy...@cisco.com>> Subject: Re: [OMPI users] Fwd: Unable to run basic mpirun command (OpenMPI v5.0.3) Hi Jeff, Thanks for the response. "Your config.log file shows that you are trying to build Open MPI 2.1.6 and that configure failed." Where are you seeing version 2.1.6 exactly? Version 5.0.3 is mentioned many times in the config.log file. Whereas if I do a recursive search for "2.1.6", it doesn't come up in any of the log files. Also, the configure didn't give any error message. It successfully completed with: configure: exit 0 And I never installed version 2.1.6. Are you sure you are looking at the right file? "Can you provide all the information from https://docs.open-mpi.org/en/v5.0.x/getting-help.html? (e.g., tar all the files up in a single file – makes it easier to download and examine everything)" Here's the TAR file: https://drive.google.com/file/d/19cr7Y4gyCEP0Aa2isTnASItOe9wmfTSK/view?usp=sharing When I used the first script provided on that webpage, I got the following error: ``` + tar -x -C /home/jupyter/openmpi-5.0.3/ompi-output - ++ find . -name config.log + tar -cf ./3rd-party/libevent-2.1.12-stable/config.log ./3rd-party/openpmix/config.log ./3rd-party/romio341/mpl/config.log ./3rd-party/romio341/config.log ./3rd-party/prrte/config.log ./config.log tar: This does not look like a tar archive tar: -: Not found in archive tar: Exiting with failure status due to previous errors ``` This is why I didn't generate the TAR file in the first place. I fixed the script now. Best, Terence On Fri, 3 May 2024 at 23:43, Jeff Squyres (jsquyres) <jsquy...@cisco.com<mailto:jsquy...@cisco.com>> wrote: Your config.log file shows that you are trying to build Open MPI 2.1.6 and that configure failed. I'm not sure how to square this with the information that you provided in your message... did you upload the wrong config.log? Can you provide all the information from https://docs.open-mpi.org/en/v5.0.x/getting-help.html? (e.g., tar all the files up in a single file – makes it easier to download and examine everything) ________________________________ From: users <users-boun...@lists.open-mpi.org<mailto:users-boun...@lists.open-mpi.org>> on behalf of T Brouns via users <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> Sent: Friday, May 3, 2024 4:04 PM To: users@lists.open-mpi.org<mailto:users@lists.open-mpi.org> <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> Cc: T Brouns <t.s.n.bro...@gmail.com<mailto:t.s.n.bro...@gmail.com>> Subject: [OMPI users] Fwd: Unable to run basic mpirun command (OpenMPI v5.0.3) Hello, I'm experiencing issues running simple `mpirun` commands, after installing OpenMPI v5.0.3. When I run any command with `mpirun`, for example: ``` mpirun --help mpirun --version mpirun uptime mpirun hello_c.c ``` I end up with the following error (in every case): ``` -------------------------------------------------------------------------- Sorry! You were supposed to get help about: prterun-exec-failed from the file: /usr/share/openmpi/help-mpirun.txt: No such file or directory But I couldn't find that topic in the file. Sorry! -------------------------------------------------------------------------- ``` I've installed OpenMPI using these steps: https://docs.open-mpi.org/en/v5.0.x/installing-open-mpi/quickstart.html When I install an older version of OpenMPI (such as v4.0.5), I end up with the following error instead, when running `mpirun`: ``` -------------------------------------------------------------------------- It looks like opal_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during opal_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): opal_shmem_base_select failed --> Returned value -1 instead of OPAL_SUCCESS -------------------------------------------------------------------------- ``` You can find all the log files over here: https://drive.google.com/drive/folders/163N5Xx5UJZ7fKU172VZSGF2nPY6z0tJF?usp=sharing Love to get some help on this. Thanks. Best, Terence