Paul,

I tend not to rely on the MPI being built with Slurm :)  I find that the
systems I use haven't done that. :(  I'm not exactly sure why, but that is
the way it is :)

Up to now, using scontrol has always worked for me. However, a new system
is not cooperating (it is running on the submittal host and not the compute
nodes) and I'm trying to debug it. My first step was to check that the job
was getting the compute nodes names (the list of nodes from Slurm is
empty). This led to my question about the "canonical" way to get the
hostlist (I'm checking using the hostlist and just relying on Slurm being
integrated into the mpi - both don't work since the hostlist is empty).

It looks like there is a canonical way to do it as you mentioned. FAQ
worthy? Definitely for my own Slurm FAQ. Others will decide if it is worthy
for Slurm docs :)

Thanks everyone for your help!

Jeff


On Mon, Aug 12, 2024 at 9:36 AM Paul Edmon via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Normally MPI will just pick up the host list from Slurm itself. You just
> need to build MPI against Slurm and it will just grab it. Typically this is
> transparent to the user. Normally you shouldn't need to pass a host list at
> all. See: https://slurm.schedmd.com/mpi_guide.html
>
> The canonical way to do it if you need to would be the scontrol show
> hostnames command against the $SLURM_JOB_NODELIST (
> https://slurm.schedmd.com/scontrol.html#OPT_hostnames). That will give
> you the list of hosts your job is set to run on.
>
> -Paul Edmon-
> On 8/12/2024 8:34 AM, Jeffrey Layton via slurm-users wrote:
>
> Thanks! I admit I'm not that experienced in Bash. I will give this a whirl
> as a test.
>
> In the meantime, let ask, what is the "canonical" way to create the host
> list? It would be nice to have this in the Slurm FAQ somewhere.
>
> Thanks!
>
> Jeff
>
>
>
> On Fri, Aug 9, 2024 at 1:32 PM Hermann Schwärzler via slurm-users <
> slurm-users@lists.schedmd.com> wrote:
>
>> Hi Paul,
>>
>> On 8/9/24 18:45, Paul Edmon via slurm-users wrote:
>> > As I recall I think OpenMPI needs a list that has an entry on each
>> line,
>> > rather than one seperated by a space. See:
>> >
>> > [root@holy7c26401 ~]# echo $SLURM_JOB_NODELIST
>> > holy7c[26401-26405]
>> > [root@holy7c26401 ~]# scontrol show hostnames $SLURM_JOB_NODELIST
>> > holy7c26401
>> > holy7c26402
>> > holy7c26403
>> > holy7c26404
>> > holy7c26405
>> >
>> > [root@holy7c26401 ~]# list=$(scontrol show hostname $SLURM_NODELIST)
>> > [root@holy7c26401 ~]# echo $list
>> > holy7c26401 holy7c26402 holy7c26403 holy7c26404 holy7c26405
>>
>> proper quoting does wonders here (please consult the man-page of bash).
>> If you try
>>
>> echo "$list"
>>
>> you will see that you will get
>>
>> holy7c26401
>> holy7c26402
>> holy7c26403
>> holy7c26404
>> holy7c26405
>>
>> So you *can* pass this around in a variable if you use "$variable"
>> whenever you provide it to a utility.
>>
>> Regards,
>> Hermann
>>
>> --
>> slurm-users mailing list -- slurm-users@lists.schedmd.com
>> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>>
>
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to