[slurm-users] Re: GPU Accounting

2024-10-03 Thread Bjørn-Helge Mevik via slurm-users
Emyr James via slurm-users  writes:

> I have this set in slurm.conf
>
> AccountingStorageTRES=gres/gpu

I believe you need to list all types of GPUs (including MIGs) that you have 
configured on
the nodes, in addition to the general "gres/gpu".  For instance, on one of our 
clusters, we have

  
AccountingStorageTRES=gres/gpu,gres/gpu:a100,gres/gpu:rtx30,gres/gpu:1g.20gb,gres/gpu:a40

Then AllocTRES from sacct will show things like

  billing=19,cpu=6,gres/gpu:a100=1,gres/gpu=1,mem=12G,node=1

depending on what the job specifies.

-- 
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo



signature.asc
Description: PGP signature

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] license server redundancy

2024-10-03 Thread Yves Kondoszek via slurm-users
Hello.

I'm configuring a cluster that will run jobs that use FlexLM licenses.
We have what I believe is a quite standard configuration of license
servers - though I haven't found anything in the documentation or in the
mailing list archives - that often provide the same features across
several servers; for example, the license token "simulation" will be
available at let's say 1717@server1, 1717@server2 and 4000@server3 .
FlexLM clients support those through a path-style environment variable,
typically LM_LICENSE_FILE=1717@server1:1717@server2:4000@server3 .

Now, I have set up Slurm for remote dynamic licenses and update them
live through "LastConsumed" from lmstat requests, so that Slurm is aware
of some users checking them out externally (out of Slurm control).

But my problem is that the documentation clearly states that:
"When submitting jobs to remote licenses, the name and server must be
used."
So I have to specifically choose a server:
$ sbatch -L simulation@server2 script.sh
instead of telling Slurm to just use whatever license is available on
any of the servers.

Does any of you have any experience with redundant FlexLM licenses
through several servers? Is there a way Slurm could support this?

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com