[slurm-users] Re: REST API - get_user_environment

2024-08-29 Thread Daniel Letai via slurm-users
Actually this is not Slurm versioning strictly speaking, this is openapi 
versioning - the move from 0.0.38 to 0.0.39 also dropped this particular 
endpoint.


You will notice that the same major Slurm version supports different API 
versions.



On 28/08/2024 03:02:00, Chris Samuel via slurm-users wrote:

On 27/8/24 10:26 am, jpuerto--- via slurm-users wrote:


Is anyone in contact with the development team?


Folks with a support contract can submit bugs at 
https://support.schedmd.com/


I feel that this is pretty basic functionality that was removed from 
the REST API without warning. Considering that this was a "patch" 
release (based on traditional semantic versioning guidelines), this 
type of modification shouldn't have happened and makes me worry about 
upgrading in the future.


Slurm hasn't used semantic versioning for a long time, they moved to a 
year.month.minor version system a long time ago. The major releases 
are (now) every 6 months, so the most recent ones have been:


* 23.02.0
* 23.11.0 (old 9 month system)
* 24.05.0 (new 6 month system)

Next major release should be in November:

* 24.11.0

All the best,
Chris


--
Regards,

Daniel Letai
+972 (0)505 870 456

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Multiple Counts Question

2024-08-29 Thread Matteo Guglielmi via slurm-users
Hello,


Does anyone know why this is possible in slurm:


--constraint="[rack1*2&rack2*4]"


and this is not:


--constraint="[rack1*2|rack2*4]"


?



Thank you.

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] playing with --nodes=

2024-08-29 Thread Matteo Guglielmi via slurm-users
Hello,

I have a cluster with four Intel nodes (node[01-04], Feature=intel) and four 
Amd nodes (node[05-08], Feature=amd).

# job file

#SBATCH --ntasks=3
#SBATCH --nodes=2,4
#SBATCH --constraint="[intel|amd]"


env | grep SLURM


# slurm.conf


PartitionName=DEFAULT  MinNodes=1 MaxNodes=UNLIMITED


# log


SLURM_JOB_USER=software
SLURM_TASKS_PER_NODE=1(x3)
SLURM_JOB_UID=1002
SLURM_TASK_PID=49987
SLURM_LOCALID=0
SLURM_SUBMIT_DIR=/home/software
SLURMD_NODENAME=node01
SLURM_JOB_START_TIME=1724932865
SLURM_CLUSTER_NAME=cluster
SLURM_JOB_END_TIME=1724933465
SLURM_CPUS_ON_NODE=1
SLURM_JOB_CPUS_PER_NODE=1(x3)
SLURM_GTIDS=0
SLURM_JOB_PARTITION=nodes
SLURM_JOB_NUM_NODES=3
SLURM_JOBID=26
SLURM_JOB_QOS=lprio
SLURM_PROCID=0
SLURM_NTASKS=3
SLURM_TOPOLOGY_ADDR=node01
SLURM_TOPOLOGY_ADDR_PATTERN=node
SLURM_MEM_PER_CPU=0
SLURM_NODELIST=node[01-03]
SLURM_JOB_ACCOUNT=dalco
SLURM_PRIO_PROCESS=0
SLURM_NPROCS=3
SLURM_NNODES=3
SLURM_SUBMIT_HOST=master
SLURM_JOB_ID=26
SLURM_NODEID=0
SLURM_CONF=/etc/slurm/slurm.conf
SLURM_JOB_NAME=mpijob
SLURM_JOB_GID=1002

SLURM_JOB_NODELIST=node[01-03] <<<=== why three nodes? Shouldn't this still be 
two nodes?

Thank you.



-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Best practices for tracking jobs started across multiple clusters for accounting purposes.

2024-08-29 Thread David via slurm-users
Hello,

What is meant here by "tracking"? What information are you looking to
gather and track?

I'd say the simplest answer is using sacct, but I am not sure how
federated/non-federated setups come into play while using it.

David

On Tue, Aug 27, 2024 at 6:23 AM Di Bernardini, Fabio via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> I need to account for jobs composed of multiple jobs launched on multiple
> federated (and non-federated) clusters, which therefore have different job
> IDs. What are the best practices to prevent users from bypassing this
> tracking?
>
>
>
> NICE SRL, viale Monte Grappa 3/5, 20124 Milano, Italia, Registro delle
> Imprese di Milano Monza Brianza Lodi REA n. 2096882, Capitale Sociale:
> 10.329,14 EUR i.v., Cod. Fisc. e P.IVA 01133050052, Societa con Socio Unico
>
>
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>


-- 
David Rhey
---
Advanced Research Computing
University of Michigan

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Print Slurm Stats on Login

2024-08-29 Thread Paul Edmon via slurm-users

Thanks. I've made that fix.

-Paul Edmon-

On 8/28/24 5:42 PM, Davide DelVento wrote:
Thanks everybody once again and especially Paul: your job_summary 
script was exactly what I needed, served on a golden plate. I just had 
to modify/customize the date range and change the following line (I 
can make a PR if you want, but it's such a small change that it'd take 
more time to deal with the PR than just typing it)


-        Timelimit = 
time_to_float(Timelimit.replace('UNLIMITED','365-00:00:00'))
+        Timelimit = 
time_to_float(Timelimit.replace('UNLIMITED','365-00:00:00').replace('Partition_Limit','365-00:00:00'))


Cheers,
Davide


On Tue, Aug 27, 2024 at 1:40 PM Paul Edmon via slurm-users 
 wrote:


This thread when a bunch of different directions. However I ran with
Jeffrey's suggestion and wrote up a profile.d script along with other
supporting scripts to pull the data. The setup I put together is here
for the community to use as they see fit:

https://github.com/fasrc/puppet-slurm_stats

While this is written as a puppet module the scripts there in can be
used by anyone as its a pretty straightforward set up and the
templates
have obvious places to do a find and replace.

Naturally I'm happy to take additional merge requests. Thanks for all
the interesting conversation about this. Lots of great ideas.

-Paul Edmon-

On 8/9/24 12:04 PM, Jeffrey T Frey wrote:
> You'd have to do this within e.g. the system's bashrc
infrastructure.  The simplest idea would be to add to e.g.
/etc/profile.d/zzz-slurmstats.sh and have some canned
commands/scripts running.  That does introduce load to the system
and Slurm on every login, though, and slows the startup of login
shells based on how responsive slurmctld/slurmdbd are at that moment.
>
> Another option would be to run the commands/scripts for all
users on some timed schedule — e.g. produce per-user stats every
30 minutes.  So long as the stats are publicly-visible anyway, put
those summaries in a shared file system with open read access. 
Name the files by uid number.  Now your /etc/profile.d script just
cat's ${STATS_DIR}/$(id -u).
>
>
>
>
>> On Aug 9, 2024, at 11:11, Paul Edmon via slurm-users
 wrote:
>>
>> We are working to make our users more aware of their usage. One
of the ideas we came up with was to having some basic usage stats
printed at login (usage over past day, fairshare, job efficiency,
etc). Does anyone have any scripts or methods that they use to do
this? Before baking my own I was curious what other sites do and
if they would be willing to share their scripts and methodology.
>>
>> -Paul Edmon-
>>
>>
>> --
>> slurm-users mailing list -- slurm-users@lists.schedmd.com
>> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com

To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: playing with --nodes=

2024-08-29 Thread Brian Andrus via slurm-users

Your --nodes line is incorrect:

*-N*,*--nodes*=[-/maxnodes/]|
   Request that a minimum of/minnodes/nodes be allocated to this job. A
   maximum node count may also be specified with/maxnodes/.

Looks like it ignored that and used ntasks with ntasks-per-node as 1, 
giving you 3 nodes. Check your logs and check your conf see what your 
defaults are.


Brian Andrus


On 8/29/2024 5:04 AM, Matteo Guglielmi via slurm-users wrote:

Hello,

I have a cluster with four Intel nodes (node[01-04], Feature=intel) and four 
Amd nodes (node[05-08], Feature=amd).

# job file

#SBATCH --ntasks=3
#SBATCH --nodes=2,4
#SBATCH --constraint="[intel|amd]"


env | grep SLURM


# slurm.conf


PartitionName=DEFAULT  MinNodes=1 MaxNodes=UNLIMITED


# log


SLURM_JOB_USER=software
SLURM_TASKS_PER_NODE=1(x3)
SLURM_JOB_UID=1002
SLURM_TASK_PID=49987
SLURM_LOCALID=0
SLURM_SUBMIT_DIR=/home/software
SLURMD_NODENAME=node01
SLURM_JOB_START_TIME=1724932865
SLURM_CLUSTER_NAME=cluster
SLURM_JOB_END_TIME=1724933465
SLURM_CPUS_ON_NODE=1
SLURM_JOB_CPUS_PER_NODE=1(x3)
SLURM_GTIDS=0
SLURM_JOB_PARTITION=nodes
SLURM_JOB_NUM_NODES=3
SLURM_JOBID=26
SLURM_JOB_QOS=lprio
SLURM_PROCID=0
SLURM_NTASKS=3
SLURM_TOPOLOGY_ADDR=node01
SLURM_TOPOLOGY_ADDR_PATTERN=node
SLURM_MEM_PER_CPU=0
SLURM_NODELIST=node[01-03]
SLURM_JOB_ACCOUNT=dalco
SLURM_PRIO_PROCESS=0
SLURM_NPROCS=3
SLURM_NNODES=3
SLURM_SUBMIT_HOST=master
SLURM_JOB_ID=26
SLURM_NODEID=0
SLURM_CONF=/etc/slurm/slurm.conf
SLURM_JOB_NAME=mpijob
SLURM_JOB_GID=1002

SLURM_JOB_NODELIST=node[01-03] <<<=== why three nodes? Shouldn't this still be 
two nodes?

Thank you.



-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: playing with --nodes=

2024-08-29 Thread Matteo Guglielmi via slurm-users
Hi,


On sbatch's manpage there is this example for :


--nodes=1,5,9,13


so either one specifies [-maxnodes] OR .


I checked the logs, and there are no reported errors about wrong or ignored 
options.


MG


From: Brian Andrus via slurm-users 
Sent: Thursday, August 29, 2024 4:11:25 PM
To: slurm-users@lists.schedmd.com
Subject: [slurm-users] Re: playing with --nodes=


Your --nodes line is incorrect:

-N, --nodes=[-maxnodes]|
Request that a minimum of minnodes nodes be allocated to this job. A maximum 
node count may also be specified with maxnodes.

Looks like it ignored that and used ntasks with ntasks-per-node as 1, giving 
you 3 nodes. Check your logs and check your conf see what your defaults are.

Brian Andrus


On 8/29/2024 5:04 AM, Matteo Guglielmi via slurm-users wrote:

Hello,

I have a cluster with four Intel nodes (node[01-04], Feature=intel) and four 
Amd nodes (node[05-08], Feature=amd).

# job file

#SBATCH --ntasks=3
#SBATCH --nodes=2,4
#SBATCH --constraint="[intel|amd]"


env | grep SLURM


# slurm.conf


PartitionName=DEFAULT  MinNodes=1 MaxNodes=UNLIMITED


# log


SLURM_JOB_USER=software
SLURM_TASKS_PER_NODE=1(x3)
SLURM_JOB_UID=1002
SLURM_TASK_PID=49987
SLURM_LOCALID=0
SLURM_SUBMIT_DIR=/home/software
SLURMD_NODENAME=node01
SLURM_JOB_START_TIME=1724932865
SLURM_CLUSTER_NAME=cluster
SLURM_JOB_END_TIME=1724933465
SLURM_CPUS_ON_NODE=1
SLURM_JOB_CPUS_PER_NODE=1(x3)
SLURM_GTIDS=0
SLURM_JOB_PARTITION=nodes
SLURM_JOB_NUM_NODES=3
SLURM_JOBID=26
SLURM_JOB_QOS=lprio
SLURM_PROCID=0
SLURM_NTASKS=3
SLURM_TOPOLOGY_ADDR=node01
SLURM_TOPOLOGY_ADDR_PATTERN=node
SLURM_MEM_PER_CPU=0
SLURM_NODELIST=node[01-03]
SLURM_JOB_ACCOUNT=dalco
SLURM_PRIO_PROCESS=0
SLURM_NPROCS=3
SLURM_NNODES=3
SLURM_SUBMIT_HOST=master
SLURM_JOB_ID=26
SLURM_NODEID=0
SLURM_CONF=/etc/slurm/slurm.conf
SLURM_JOB_NAME=mpijob
SLURM_JOB_GID=1002

SLURM_JOB_NODELIST=node[01-03] <<<=== why three nodes? Shouldn't this still be 
two nodes?

Thank you.





-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: playing with --nodes=

2024-08-29 Thread Brian Andrus via slurm-users
It looks to me that you requested 3 tasks spread across 2 to 4 nodes. 
Realize --nodes is not targeting your nodes named 2 and 4, it is a count 
of how many nodes to use. You only needed 3 tasks/cpus, so that is what 
you were allocated and you have 1 cpu per node, so you get 3 (of up to 
4) nodes. Slurm does not give you 4 nodes because you only want 3 tasks.


You see the result in your variables:

SLURM_NNODES=3
SLURM_JOB_CPUS_PER_NODE=1(x3)

If you only want 2 nodes, make --nodes=2.

Brian Andrus

On 8/29/24 08:00, Matteo Guglielmi via slurm-users wrote:


Hi,


On sbatch's manpage there is this example for :


--nodes=1,5,9,13


so either one specifies [-maxnodes] OR .


I checked the logs, and there are no reported errors about wrong or ignored 
options.


MG


From: Brian Andrus via slurm-users
Sent: Thursday, August 29, 2024 4:11:25 PM
To:slurm-users@lists.schedmd.com
Subject: [slurm-users] Re: playing with --nodes=


Your --nodes line is incorrect:

-N, --nodes=[-maxnodes]|
Request that a minimum of minnodes nodes be allocated to this job. A maximum 
node count may also be specified with maxnodes.

Looks like it ignored that and used ntasks with ntasks-per-node as 1, giving 
you 3 nodes. Check your logs and check your conf see what your defaults are.

Brian Andrus


On 8/29/2024 5:04 AM, Matteo Guglielmi via slurm-users wrote:

Hello,

I have a cluster with four Intel nodes (node[01-04], Feature=intel) and four 
Amd nodes (node[05-08], Feature=amd).

# job file

#SBATCH --ntasks=3
#SBATCH --nodes=2,4
#SBATCH --constraint="[intel|amd]"


env | grep SLURM


# slurm.conf


PartitionName=DEFAULT  MinNodes=1 MaxNodes=UNLIMITED


# log


SLURM_JOB_USER=software
SLURM_TASKS_PER_NODE=1(x3)
SLURM_JOB_UID=1002
SLURM_TASK_PID=49987
SLURM_LOCALID=0
SLURM_SUBMIT_DIR=/home/software
SLURMD_NODENAME=node01
SLURM_JOB_START_TIME=1724932865
SLURM_CLUSTER_NAME=cluster
SLURM_JOB_END_TIME=1724933465
SLURM_CPUS_ON_NODE=1
SLURM_JOB_CPUS_PER_NODE=1(x3)
SLURM_GTIDS=0
SLURM_JOB_PARTITION=nodes
SLURM_JOB_NUM_NODES=3
SLURM_JOBID=26
SLURM_JOB_QOS=lprio
SLURM_PROCID=0
SLURM_NTASKS=3
SLURM_TOPOLOGY_ADDR=node01
SLURM_TOPOLOGY_ADDR_PATTERN=node
SLURM_MEM_PER_CPU=0
SLURM_NODELIST=node[01-03]
SLURM_JOB_ACCOUNT=dalco
SLURM_PRIO_PROCESS=0
SLURM_NPROCS=3
SLURM_NNODES=3
SLURM_SUBMIT_HOST=master
SLURM_JOB_ID=26
SLURM_NODEID=0
SLURM_CONF=/etc/slurm/slurm.conf
SLURM_JOB_NAME=mpijob
SLURM_JOB_GID=1002

SLURM_JOB_NODELIST=node[01-03] <<<=== why three nodes? Shouldn't this still be 
two nodes?

Thank you.





-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: playing with --nodes=

2024-08-29 Thread Matteo Guglielmi via slurm-users
I'm sorry, but I still don't get it.


Isn't --nodes=2,4 telling slurm to allocate 2 OR 4 nodes and nothing else?


So, if:


--nodes=2 allocates only two nodes

--nodes=4 allocates only four nodes

--nodes=1-2 allocates min one and max two nodes

--nodes=1-4 allocates min one and max four nodes


what is the allocation rule for --nodes=2,4 which is the so-called size_string 
allocation?


man sbatch says:


Node count can also be specified as size_string. The size_string specification 
identifies what nodes

values should be used. Multiple values may be specified using a comma separated 
list or with a step

function by suffix containing a colon and number values with a "-" separator.

For example, "--nodes=1-15:4" is equivalent to "--nodes=1,5,9,13".

...

The job will be allocated as many nodes as possible within the range specified 
and without delaying the

initiation of the job.


From: Brian Andrus via slurm-users 
Sent: Thursday, August 29, 2024 7:27:44 PM
To: slurm-users@lists.schedmd.com
Subject: [slurm-users] Re: playing with --nodes=


It looks to me that you requested 3 tasks spread across 2 to 4 nodes. Realize 
--nodes is not targeting your nodes named 2 and 4, it is a count of how many 
nodes to use. You only needed 3 tasks/cpus, so that is what you were allocated 
and you have 1 cpu per node, so you get 3 (of up to 4) nodes. Slurm does not 
give you 4 nodes because you only want 3 tasks.

You see the result in your variables:

SLURM_NNODES=3
SLURM_JOB_CPUS_PER_NODE=1(x3)



If you only want 2 nodes, make --nodes=2.

Brian Andrus

On 8/29/24 08:00, Matteo Guglielmi via slurm-users wrote:

Hi,


On sbatch's manpage there is this example for :


--nodes=1,5,9,13


so either one specifies [-maxnodes] OR .


I checked the logs, and there are no reported errors about wrong or ignored 
options.


MG


From: Brian Andrus via slurm-users 

Sent: Thursday, August 29, 2024 4:11:25 PM
To: slurm-users@lists.schedmd.com
Subject: [slurm-users] Re: playing with --nodes=


Your --nodes line is incorrect:

-N, --nodes=[-maxnodes]|
Request that a minimum of minnodes nodes be allocated to this job. A maximum 
node count may also be specified with maxnodes.

Looks like it ignored that and used ntasks with ntasks-per-node as 1, giving 
you 3 nodes. Check your logs and check your conf see what your defaults are.

Brian Andrus


On 8/29/2024 5:04 AM, Matteo Guglielmi via slurm-users wrote:

Hello,

I have a cluster with four Intel nodes (node[01-04], Feature=intel) and four 
Amd nodes (node[05-08], Feature=amd).

# job file

#SBATCH --ntasks=3
#SBATCH --nodes=2,4
#SBATCH --constraint="[intel|amd]"


env | grep SLURM


# slurm.conf


PartitionName=DEFAULT  MinNodes=1 MaxNodes=UNLIMITED


# log


SLURM_JOB_USER=software
SLURM_TASKS_PER_NODE=1(x3)
SLURM_JOB_UID=1002
SLURM_TASK_PID=49987
SLURM_LOCALID=0
SLURM_SUBMIT_DIR=/home/software
SLURMD_NODENAME=node01
SLURM_JOB_START_TIME=1724932865
SLURM_CLUSTER_NAME=cluster
SLURM_JOB_END_TIME=1724933465
SLURM_CPUS_ON_NODE=1
SLURM_JOB_CPUS_PER_NODE=1(x3)
SLURM_GTIDS=0
SLURM_JOB_PARTITION=nodes
SLURM_JOB_NUM_NODES=3
SLURM_JOBID=26
SLURM_JOB_QOS=lprio
SLURM_PROCID=0
SLURM_NTASKS=3
SLURM_TOPOLOGY_ADDR=node01
SLURM_TOPOLOGY_ADDR_PATTERN=node
SLURM_MEM_PER_CPU=0
SLURM_NODELIST=node[01-03]
SLURM_JOB_ACCOUNT=dalco
SLURM_PRIO_PROCESS=0
SLURM_NPROCS=3
SLURM_NNODES=3
SLURM_SUBMIT_HOST=master
SLURM_JOB_ID=26
SLURM_NODEID=0
SLURM_CONF=/etc/slurm/slurm.conf
SLURM_JOB_NAME=mpijob
SLURM_JOB_GID=1002

SLURM_JOB_NODELIST=node[01-03] <<<=== why three nodes? Shouldn't this still be 
two nodes?

Thank you.







-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com