It's in a container. Specifically horovod/horovod on the Docker hub. I'm going into the container to investigate now (I think I have a link to the dockerfile as well).
Thanks! Jeff On Mon, Aug 12, 2024 at 10:01 AM Paul Edmon <ped...@cfa.harvard.edu> wrote: > Certainly a strange setup. I would probably talk with who ever is > providing MPI for you and ask them to build it against Slurm properly. As > in order to get correct process binding you definitely want to have it > integrated properly with slurm either via PMI2 or PMIx. If you just use the > bare hostlist, your ranks may not end up properly bound to the specific > cores they are supposed to be allocated. So definitely proceed with caution > and validate your ranks are being laid out properly, as you will be relying > on mpirun/mpiexec to bootstrap rather than the scheduler. > > -Paul Edmon- > On 8/12/2024 9:55 AM, Jeffrey Layton wrote: > > Paul, > > I tend not to rely on the MPI being built with Slurm :) I find that the > systems I use haven't done that. :( I'm not exactly sure why, but that is > the way it is :) > > Up to now, using scontrol has always worked for me. However, a new system > is not cooperating (it is running on the submittal host and not the compute > nodes) and I'm trying to debug it. My first step was to check that the job > was getting the compute nodes names (the list of nodes from Slurm is > empty). This led to my question about the "canonical" way to get the > hostlist (I'm checking using the hostlist and just relying on Slurm being > integrated into the mpi - both don't work since the hostlist is empty). > > It looks like there is a canonical way to do it as you mentioned. FAQ > worthy? Definitely for my own Slurm FAQ. Others will decide if it is worthy > for Slurm docs :) > > Thanks everyone for your help! > > Jeff > > > On Mon, Aug 12, 2024 at 9:36 AM Paul Edmon via slurm-users < > slurm-users@lists.schedmd.com> wrote: > >> Normally MPI will just pick up the host list from Slurm itself. You just >> need to build MPI against Slurm and it will just grab it. Typically this is >> transparent to the user. Normally you shouldn't need to pass a host list at >> all. See: https://slurm.schedmd.com/mpi_guide.html >> >> The canonical way to do it if you need to would be the scontrol show >> hostnames command against the $SLURM_JOB_NODELIST ( >> https://slurm.schedmd.com/scontrol.html#OPT_hostnames). That will give >> you the list of hosts your job is set to run on. >> >> -Paul Edmon- >> On 8/12/2024 8:34 AM, Jeffrey Layton via slurm-users wrote: >> >> Thanks! I admit I'm not that experienced in Bash. I will give this a >> whirl as a test. >> >> In the meantime, let ask, what is the "canonical" way to create the host >> list? It would be nice to have this in the Slurm FAQ somewhere. >> >> Thanks! >> >> Jeff >> >> >> >> On Fri, Aug 9, 2024 at 1:32 PM Hermann Schwärzler via slurm-users < >> slurm-users@lists.schedmd.com> wrote: >> >>> Hi Paul, >>> >>> On 8/9/24 18:45, Paul Edmon via slurm-users wrote: >>> > As I recall I think OpenMPI needs a list that has an entry on each >>> line, >>> > rather than one seperated by a space. See: >>> > >>> > [root@holy7c26401 ~]# echo $SLURM_JOB_NODELIST >>> > holy7c[26401-26405] >>> > [root@holy7c26401 ~]# scontrol show hostnames $SLURM_JOB_NODELIST >>> > holy7c26401 >>> > holy7c26402 >>> > holy7c26403 >>> > holy7c26404 >>> > holy7c26405 >>> > >>> > [root@holy7c26401 ~]# list=$(scontrol show hostname $SLURM_NODELIST) >>> > [root@holy7c26401 ~]# echo $list >>> > holy7c26401 holy7c26402 holy7c26403 holy7c26404 holy7c26405 >>> >>> proper quoting does wonders here (please consult the man-page of bash). >>> If you try >>> >>> echo "$list" >>> >>> you will see that you will get >>> >>> holy7c26401 >>> holy7c26402 >>> holy7c26403 >>> holy7c26404 >>> holy7c26405 >>> >>> So you *can* pass this around in a variable if you use "$variable" >>> whenever you provide it to a utility. >>> >>> Regards, >>> Hermann >>> >>> -- >>> slurm-users mailing list -- slurm-users@lists.schedmd.com >>> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com >>> >> >> >> -- >> slurm-users mailing list -- slurm-users@lists.schedmd.com >> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com >> >
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com