Dear All,
Good morning.
We successfully implemented a 4-node SLURM cluster with shared storage using
GlusterFS and were able to run COMSOL programs on it. After this learning
experience, we've determined that it would be beneficial to switch to a
commercial SLURM subscription for better suppo
Ah, that's even more fun. I know with Singularity you can launch MPI
applications by calling MPI outside of the container and then having it
link to the internal version:
https://docs.sylabs.io/guides/3.3/user-guide/mpi.html Not sure about
docker though.
-Paul Edmon-
On 8/12/2024 10:30 AM,
It's in a container. Specifically horovod/horovod on the Docker hub. I'm
going into the container to investigate now (I think I have a link to the
dockerfile as well).
Thanks!
Jeff
On Mon, Aug 12, 2024 at 10:01 AM Paul Edmon wrote:
> Certainly a strange setup. I would probably talk with who e
Certainly a strange setup. I would probably talk with who ever is
providing MPI for you and ask them to build it against Slurm properly.
As in order to get correct process binding you definitely want to have
it integrated properly with slurm either via PMI2 or PMIx. If you just
use the bare hos
Paul,
I tend not to rely on the MPI being built with Slurm :) I find that the
systems I use haven't done that. :( I'm not exactly sure why, but that is
the way it is :)
Up to now, using scontrol has always worked for me. However, a new system
is not cooperating (it is running on the submittal h
Normally MPI will just pick up the host list from Slurm itself. You just
need to build MPI against Slurm and it will just grab it. Typically this
is transparent to the user. Normally you shouldn't need to pass a host
list at all. See: https://slurm.schedmd.com/mpi_guide.html
The canonical way
Thanks! I admit I'm not that experienced in Bash. I will give this a whirl
as a test.
In the meantime, let ask, what is the "canonical" way to create the host
list? It would be nice to have this in the Slurm FAQ somewhere.
Thanks!
Jeff
On Fri, Aug 9, 2024 at 1:32 PM Hermann Schwärzler via slu