[slurm-users] Re: Annoying canonical question about converting SLURM_JOB_NODELIST to a host list for mpirun

2024-08-12 Thread Jeffrey Layton via slurm-users
Thanks! I admit I'm not that experienced in Bash. I will give this a whirl
as a test.

In the meantime, let ask, what is the "canonical" way to create the host
list? It would be nice to have this in the Slurm FAQ somewhere.

Thanks!

Jeff



On Fri, Aug 9, 2024 at 1:32 PM Hermann Schwärzler via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Hi Paul,
>
> On 8/9/24 18:45, Paul Edmon via slurm-users wrote:
> > As I recall I think OpenMPI needs a list that has an entry on each line,
> > rather than one seperated by a space. See:
> >
> > [root@holy7c26401 ~]# echo $SLURM_JOB_NODELIST
> > holy7c[26401-26405]
> > [root@holy7c26401 ~]# scontrol show hostnames $SLURM_JOB_NODELIST
> > holy7c26401
> > holy7c26402
> > holy7c26403
> > holy7c26404
> > holy7c26405
> >
> > [root@holy7c26401 ~]# list=$(scontrol show hostname $SLURM_NODELIST)
> > [root@holy7c26401 ~]# echo $list
> > holy7c26401 holy7c26402 holy7c26403 holy7c26404 holy7c26405
>
> proper quoting does wonders here (please consult the man-page of bash).
> If you try
>
> echo "$list"
>
> you will see that you will get
>
> holy7c26401
> holy7c26402
> holy7c26403
> holy7c26404
> holy7c26405
>
> So you *can* pass this around in a variable if you use "$variable"
> whenever you provide it to a utility.
>
> Regards,
> Hermann
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Annoying canonical question about converting SLURM_JOB_NODELIST to a host list for mpirun

2024-08-12 Thread Paul Edmon via slurm-users
Normally MPI will just pick up the host list from Slurm itself. You just 
need to build MPI against Slurm and it will just grab it. Typically this 
is transparent to the user. Normally you shouldn't need to pass a host 
list at all. See: https://slurm.schedmd.com/mpi_guide.html


The canonical way to do it if you need to would be the scontrol show 
hostnames command against the $SLURM_JOB_NODELIST 
(https://slurm.schedmd.com/scontrol.html#OPT_hostnames). That will give 
you the list of hosts your job is set to run on.


-Paul Edmon-

On 8/12/2024 8:34 AM, Jeffrey Layton via slurm-users wrote:
Thanks! I admit I'm not that experienced in Bash. I will give this a 
whirl as a test.


In the meantime, let ask, what is the "canonical" way to create the 
host list? It would be nice to have this in the Slurm FAQ somewhere.


Thanks!

Jeff



On Fri, Aug 9, 2024 at 1:32 PM Hermann Schwärzler via slurm-users 
 wrote:


Hi Paul,

On 8/9/24 18:45, Paul Edmon via slurm-users wrote:
> As I recall I think OpenMPI needs a list that has an entry on
each line,
> rather than one seperated by a space. See:
>
> [root@holy7c26401 ~]# echo $SLURM_JOB_NODELIST
> holy7c[26401-26405]
> [root@holy7c26401 ~]# scontrol show hostnames $SLURM_JOB_NODELIST
> holy7c26401
> holy7c26402
> holy7c26403
> holy7c26404
> holy7c26405
>
> [root@holy7c26401 ~]# list=$(scontrol show hostname $SLURM_NODELIST)
> [root@holy7c26401 ~]# echo $list
> holy7c26401 holy7c26402 holy7c26403 holy7c26404 holy7c26405

proper quoting does wonders here (please consult the man-page of
bash).
If you try

echo "$list"

you will see that you will get

holy7c26401
holy7c26402
holy7c26403
holy7c26404
holy7c26405

So you *can* pass this around in a variable if you use "$variable"
whenever you provide it to a utility.

Regards,
Hermann

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com

To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Annoying canonical question about converting SLURM_JOB_NODELIST to a host list for mpirun

2024-08-12 Thread Jeffrey Layton via slurm-users
Paul,

I tend not to rely on the MPI being built with Slurm :)  I find that the
systems I use haven't done that. :(  I'm not exactly sure why, but that is
the way it is :)

Up to now, using scontrol has always worked for me. However, a new system
is not cooperating (it is running on the submittal host and not the compute
nodes) and I'm trying to debug it. My first step was to check that the job
was getting the compute nodes names (the list of nodes from Slurm is
empty). This led to my question about the "canonical" way to get the
hostlist (I'm checking using the hostlist and just relying on Slurm being
integrated into the mpi - both don't work since the hostlist is empty).

It looks like there is a canonical way to do it as you mentioned. FAQ
worthy? Definitely for my own Slurm FAQ. Others will decide if it is worthy
for Slurm docs :)

Thanks everyone for your help!

Jeff


On Mon, Aug 12, 2024 at 9:36 AM Paul Edmon via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Normally MPI will just pick up the host list from Slurm itself. You just
> need to build MPI against Slurm and it will just grab it. Typically this is
> transparent to the user. Normally you shouldn't need to pass a host list at
> all. See: https://slurm.schedmd.com/mpi_guide.html
>
> The canonical way to do it if you need to would be the scontrol show
> hostnames command against the $SLURM_JOB_NODELIST (
> https://slurm.schedmd.com/scontrol.html#OPT_hostnames). That will give
> you the list of hosts your job is set to run on.
>
> -Paul Edmon-
> On 8/12/2024 8:34 AM, Jeffrey Layton via slurm-users wrote:
>
> Thanks! I admit I'm not that experienced in Bash. I will give this a whirl
> as a test.
>
> In the meantime, let ask, what is the "canonical" way to create the host
> list? It would be nice to have this in the Slurm FAQ somewhere.
>
> Thanks!
>
> Jeff
>
>
>
> On Fri, Aug 9, 2024 at 1:32 PM Hermann Schwärzler via slurm-users <
> slurm-users@lists.schedmd.com> wrote:
>
>> Hi Paul,
>>
>> On 8/9/24 18:45, Paul Edmon via slurm-users wrote:
>> > As I recall I think OpenMPI needs a list that has an entry on each
>> line,
>> > rather than one seperated by a space. See:
>> >
>> > [root@holy7c26401 ~]# echo $SLURM_JOB_NODELIST
>> > holy7c[26401-26405]
>> > [root@holy7c26401 ~]# scontrol show hostnames $SLURM_JOB_NODELIST
>> > holy7c26401
>> > holy7c26402
>> > holy7c26403
>> > holy7c26404
>> > holy7c26405
>> >
>> > [root@holy7c26401 ~]# list=$(scontrol show hostname $SLURM_NODELIST)
>> > [root@holy7c26401 ~]# echo $list
>> > holy7c26401 holy7c26402 holy7c26403 holy7c26404 holy7c26405
>>
>> proper quoting does wonders here (please consult the man-page of bash).
>> If you try
>>
>> echo "$list"
>>
>> you will see that you will get
>>
>> holy7c26401
>> holy7c26402
>> holy7c26403
>> holy7c26404
>> holy7c26405
>>
>> So you *can* pass this around in a variable if you use "$variable"
>> whenever you provide it to a utility.
>>
>> Regards,
>> Hermann
>>
>> --
>> slurm-users mailing list -- slurm-users@lists.schedmd.com
>> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>>
>
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Annoying canonical question about converting SLURM_JOB_NODELIST to a host list for mpirun

2024-08-12 Thread Paul Edmon via slurm-users
Certainly a strange setup. I would probably talk with who ever is 
providing MPI for you and ask them to build it against Slurm properly. 
As in order to get correct process binding you definitely want to have 
it integrated properly with slurm either via PMI2 or PMIx. If you just 
use the bare hostlist, your ranks may not end up properly bound to the 
specific cores they are supposed to be allocated. So definitely proceed 
with caution and validate your ranks are being laid out properly, as you 
will be relying on mpirun/mpiexec to bootstrap rather than the scheduler.


-Paul Edmon-

On 8/12/2024 9:55 AM, Jeffrey Layton wrote:

Paul,

I tend not to rely on the MPI being built with Slurm :)  I find that 
the systems I use haven't done that. :(  I'm not exactly sure why, but 
that is the way it is :)


Up to now, using scontrol has always worked for me. However, a new 
system is not cooperating (it is running on the submittal host and not 
the compute nodes) and I'm trying to debug it. My first step was to 
check that the job was getting the compute nodes names (the list of 
nodes from Slurm is empty). This led to my question about the 
"canonical" way to get the hostlist (I'm checking using the hostlist 
and just relying on Slurm being integrated into the mpi - both don't 
work since the hostlist is empty).


It looks like there is a canonical way to do it as you mentioned. FAQ 
worthy? Definitely for my own Slurm FAQ. Others will decide if it is 
worthy for Slurm docs :)


Thanks everyone for your help!

Jeff


On Mon, Aug 12, 2024 at 9:36 AM Paul Edmon via slurm-users 
 wrote:


Normally MPI will just pick up the host list from Slurm itself.
You just need to build MPI against Slurm and it will just grab it.
Typically this is transparent to the user. Normally you shouldn't
need to pass a host list at all. See:
https://slurm.schedmd.com/mpi_guide.html

The canonical way to do it if you need to would be the scontrol
show hostnames command against the $SLURM_JOB_NODELIST
(https://slurm.schedmd.com/scontrol.html#OPT_hostnames). That will
give you the list of hosts your job is set to run on.

-Paul Edmon-

On 8/12/2024 8:34 AM, Jeffrey Layton via slurm-users wrote:

Thanks! I admit I'm not that experienced in Bash. I will give
this a whirl as a test.

In the meantime, let ask, what is the "canonical" way to create
the host list? It would be nice to have this in the Slurm FAQ
somewhere.

Thanks!

Jeff



On Fri, Aug 9, 2024 at 1:32 PM Hermann Schwärzler via slurm-users
 wrote:

Hi Paul,

On 8/9/24 18:45, Paul Edmon via slurm-users wrote:
> As I recall I think OpenMPI needs a list that has an entry
on each line,
> rather than one seperated by a space. See:
>
> [root@holy7c26401 ~]# echo $SLURM_JOB_NODELIST
> holy7c[26401-26405]
> [root@holy7c26401 ~]# scontrol show hostnames
$SLURM_JOB_NODELIST
> holy7c26401
> holy7c26402
> holy7c26403
> holy7c26404
> holy7c26405
>
> [root@holy7c26401 ~]# list=$(scontrol show hostname
$SLURM_NODELIST)
> [root@holy7c26401 ~]# echo $list
> holy7c26401 holy7c26402 holy7c26403 holy7c26404 holy7c26405

proper quoting does wonders here (please consult the man-page
of bash).
If you try

echo "$list"

you will see that you will get

holy7c26401
holy7c26402
holy7c26403
holy7c26404
holy7c26405

So you *can* pass this around in a variable if you use
"$variable"
whenever you provide it to a utility.

Regards,
Hermann

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com

To unsubscribe send an email to
slurm-users-le...@lists.schedmd.com




-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com

To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Annoying canonical question about converting SLURM_JOB_NODELIST to a host list for mpirun

2024-08-12 Thread Jeffrey Layton via slurm-users
It's in a container. Specifically horovod/horovod on the Docker hub. I'm
going into the container to investigate now (I think I have a link to the
dockerfile as well).

Thanks!

Jeff


On Mon, Aug 12, 2024 at 10:01 AM Paul Edmon  wrote:

> Certainly a strange setup. I would probably talk with who ever is
> providing MPI for you and ask them to build it against Slurm properly. As
> in order to get correct process binding you definitely want to have it
> integrated properly with slurm either via PMI2 or PMIx. If you just use the
> bare hostlist, your ranks may not end up properly bound to the specific
> cores they are supposed to be allocated. So definitely proceed with caution
> and validate your ranks are being laid out properly, as you will be relying
> on mpirun/mpiexec to bootstrap rather than the scheduler.
>
> -Paul Edmon-
> On 8/12/2024 9:55 AM, Jeffrey Layton wrote:
>
> Paul,
>
> I tend not to rely on the MPI being built with Slurm :)  I find that the
> systems I use haven't done that. :(  I'm not exactly sure why, but that is
> the way it is :)
>
> Up to now, using scontrol has always worked for me. However, a new system
> is not cooperating (it is running on the submittal host and not the compute
> nodes) and I'm trying to debug it. My first step was to check that the job
> was getting the compute nodes names (the list of nodes from Slurm is
> empty). This led to my question about the "canonical" way to get the
> hostlist (I'm checking using the hostlist and just relying on Slurm being
> integrated into the mpi - both don't work since the hostlist is empty).
>
> It looks like there is a canonical way to do it as you mentioned. FAQ
> worthy? Definitely for my own Slurm FAQ. Others will decide if it is worthy
> for Slurm docs :)
>
> Thanks everyone for your help!
>
> Jeff
>
>
> On Mon, Aug 12, 2024 at 9:36 AM Paul Edmon via slurm-users <
> slurm-users@lists.schedmd.com> wrote:
>
>> Normally MPI will just pick up the host list from Slurm itself. You just
>> need to build MPI against Slurm and it will just grab it. Typically this is
>> transparent to the user. Normally you shouldn't need to pass a host list at
>> all. See: https://slurm.schedmd.com/mpi_guide.html
>>
>> The canonical way to do it if you need to would be the scontrol show
>> hostnames command against the $SLURM_JOB_NODELIST (
>> https://slurm.schedmd.com/scontrol.html#OPT_hostnames). That will give
>> you the list of hosts your job is set to run on.
>>
>> -Paul Edmon-
>> On 8/12/2024 8:34 AM, Jeffrey Layton via slurm-users wrote:
>>
>> Thanks! I admit I'm not that experienced in Bash. I will give this a
>> whirl as a test.
>>
>> In the meantime, let ask, what is the "canonical" way to create the host
>> list? It would be nice to have this in the Slurm FAQ somewhere.
>>
>> Thanks!
>>
>> Jeff
>>
>>
>>
>> On Fri, Aug 9, 2024 at 1:32 PM Hermann Schwärzler via slurm-users <
>> slurm-users@lists.schedmd.com> wrote:
>>
>>> Hi Paul,
>>>
>>> On 8/9/24 18:45, Paul Edmon via slurm-users wrote:
>>> > As I recall I think OpenMPI needs a list that has an entry on each
>>> line,
>>> > rather than one seperated by a space. See:
>>> >
>>> > [root@holy7c26401 ~]# echo $SLURM_JOB_NODELIST
>>> > holy7c[26401-26405]
>>> > [root@holy7c26401 ~]# scontrol show hostnames $SLURM_JOB_NODELIST
>>> > holy7c26401
>>> > holy7c26402
>>> > holy7c26403
>>> > holy7c26404
>>> > holy7c26405
>>> >
>>> > [root@holy7c26401 ~]# list=$(scontrol show hostname $SLURM_NODELIST)
>>> > [root@holy7c26401 ~]# echo $list
>>> > holy7c26401 holy7c26402 holy7c26403 holy7c26404 holy7c26405
>>>
>>> proper quoting does wonders here (please consult the man-page of bash).
>>> If you try
>>>
>>> echo "$list"
>>>
>>> you will see that you will get
>>>
>>> holy7c26401
>>> holy7c26402
>>> holy7c26403
>>> holy7c26404
>>> holy7c26405
>>>
>>> So you *can* pass this around in a variable if you use "$variable"
>>> whenever you provide it to a utility.
>>>
>>> Regards,
>>> Hermann
>>>
>>> --
>>> slurm-users mailing list -- slurm-users@lists.schedmd.com
>>> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>>>
>>
>>
>> --
>> slurm-users mailing list -- slurm-users@lists.schedmd.com
>> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>>
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Annoying canonical question about converting SLURM_JOB_NODELIST to a host list for mpirun

2024-08-12 Thread Paul Edmon via slurm-users
Ah, that's even more fun. I know with Singularity you can launch MPI 
applications by calling MPI outside of the container and then having it 
link to the internal version: 
https://docs.sylabs.io/guides/3.3/user-guide/mpi.html  Not sure about 
docker though.


-Paul Edmon-

On 8/12/2024 10:30 AM, Jeffrey Layton wrote:
It's in a container. Specifically horovod/horovod on the Docker hub. 
I'm going into the container to investigate now (I think I have a link 
to the dockerfile as well).


Thanks!

Jeff


On Mon, Aug 12, 2024 at 10:01 AM Paul Edmon  
wrote:


Certainly a strange setup. I would probably talk with who ever is
providing MPI for you and ask them to build it against Slurm
properly. As in order to get correct process binding you
definitely want to have it integrated properly with slurm either
via PMI2 or PMIx. If you just use the bare hostlist, your ranks
may not end up properly bound to the specific cores they are
supposed to be allocated. So definitely proceed with caution and
validate your ranks are being laid out properly, as you will be
relying on mpirun/mpiexec to bootstrap rather than the scheduler.

-Paul Edmon-

On 8/12/2024 9:55 AM, Jeffrey Layton wrote:

Paul,

I tend not to rely on the MPI being built with Slurm :) I find
that the systems I use haven't done that. :( I'm not exactly sure
why, but that is the way it is :)

Up to now, using scontrol has always worked for me. However, a
new system is not cooperating (it is running on the submittal
host and not the compute nodes) and I'm trying to debug it. My
first step was to check that the job was getting the compute
nodes names (the list of nodes from Slurm is empty). This led to
my question about the "canonical" way to get the hostlist (I'm
checking using the hostlist and just relying on Slurm being
integrated into the mpi - both don't work since the hostlist is
empty).

It looks like there is a canonical way to do it as you mentioned.
FAQ worthy? Definitely for my own Slurm FAQ. Others will decide
if it is worthy for Slurm docs :)

Thanks everyone for your help!

Jeff


On Mon, Aug 12, 2024 at 9:36 AM Paul Edmon via slurm-users
 wrote:

Normally MPI will just pick up the host list from Slurm
itself. You just need to build MPI against Slurm and it will
just grab it. Typically this is transparent to the user.
Normally you shouldn't need to pass a host list at all. See:
https://slurm.schedmd.com/mpi_guide.html

The canonical way to do it if you need to would be the
scontrol show hostnames command against the
$SLURM_JOB_NODELIST
(https://slurm.schedmd.com/scontrol.html#OPT_hostnames). That
will give you the list of hosts your job is set to run on.

-Paul Edmon-

On 8/12/2024 8:34 AM, Jeffrey Layton via slurm-users wrote:

Thanks! I admit I'm not that experienced in Bash. I will
give this a whirl as a test.

In the meantime, let ask, what is the "canonical" way to
create the host list? It would be nice to have this in the
Slurm FAQ somewhere.

Thanks!

Jeff



On Fri, Aug 9, 2024 at 1:32 PM Hermann Schwärzler via
slurm-users  wrote:

Hi Paul,

On 8/9/24 18:45, Paul Edmon via slurm-users wrote:
> As I recall I think OpenMPI needs a list that has an
entry on each line,
> rather than one seperated by a space. See:
>
> [root@holy7c26401 ~]# echo $SLURM_JOB_NODELIST
> holy7c[26401-26405]
> [root@holy7c26401 ~]# scontrol show hostnames
$SLURM_JOB_NODELIST
> holy7c26401
> holy7c26402
> holy7c26403
> holy7c26404
> holy7c26405
>
> [root@holy7c26401 ~]# list=$(scontrol show hostname
$SLURM_NODELIST)
> [root@holy7c26401 ~]# echo $list
> holy7c26401 holy7c26402 holy7c26403 holy7c26404
holy7c26405

proper quoting does wonders here (please consult the
man-page of bash).
If you try

echo "$list"

you will see that you will get

holy7c26401
holy7c26402
holy7c26403
holy7c26404
holy7c26405

So you *can* pass this around in a variable if you use
"$variable"
whenever you provide it to a utility.

Regards,
Hermann

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com

To unsubscribe send an email to
slurm-users-le...@lists.schedmd.com




-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com

To unsubscribe send an email to
slurm-users-le...@lists.schedmd

[slurm-users] Seeking Commercial SLURM Subscription Provider

2024-08-12 Thread John Joseph via slurm-users

Dear All,

Good morning.

We successfully implemented a 4-node SLURM cluster with shared storage using 
GlusterFS and were able to run COMSOL programs on it. After this learning 
experience, we've determined that it would be beneficial to switch to a 
commercial SLURM subscription for better support.

We are currently seeking a solution provider who can offer support based on a 
commercial subscription. I would like to reach out to the group for 
recommendations or advice on how we can avail these services commercially.
Thank you.Joseph John 


-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com