Re: [slurm-users] Usage gathering for GPUs

2023-06-06 Thread Vecerka Daniel
Hi all,  I'm trying to get working the gathering of gres/gpumem and gres/gpuutil on Slurm 23.02.2 , but with no success yet. We have: AccountingStorageTRES=cpu,mem,gres/gpu in the slurm.conf and Slurm is build with NVML support. Autodetect=NVML in gres.conf gres/gpumem and gres/gpuutil now a

[slurm-users] Moving Job form local to remote Cluster

2023-06-06 Thread Shaghuf Rahman
Hi, Greetings of the day! Need your suggestions on the below use cases. I have 2 Slurm clusters pointing to the same database server. I'm submitting a job in a local cluster and once my local cluster resources get full I want to move the pending jobs to my remote cluster. Is there any way to ach

[slurm-users] Cloud node utilization reporting

2023-06-06 Thread Chip Seraphine
Hello, I’ve got a problem that I’d imagine others have as well and am wondering how it is handled. I produce periodic reports for my management showing, among other things, the overall “cluster utilization”, which we define as basically the ratio of CPU*Minutes allocated to CPU*Minutes availab

[slurm-users] Trying to update from slurm 19.05 to slurm 23.02 but I can't figure out how to allow users to reboot nodes...

2023-06-06 Thread Heinz, Michael
I recently took over running a slurm cluster that among other things allows users to reboot nodes in order to have them re-configured for different kinds of tests. This is accomplished through the RebootProgram config setting. But in the test environment I set up we seem to have lost that capabi

Re: [slurm-users] Trying to update from slurm 19.05 to slurm 23.02 but I can't figure out how to allow users to reboot nodes...

2023-06-06 Thread Christopher Samuel
On 6/6/23 1:33 pm, Heinz, Michael wrote: I've gone through the man pages for slurm.conf but I can't find anything about how to define who the admins are? Is there still a way to do this with slurm or has the ability been removed? Looks like that was disabled over 3 years ago. commit dd111a5

Re: [slurm-users] Trying to update from slurm 19.05 to slurm 23.02 but I can't figure out how to allow users to reboot nodes...

2023-06-06 Thread Heinz, Michael
Yeah, looks like if we still want to do this I need to set up slurmdbd and an account database. Sent from my iPad > On Jun 6, 2023, at 5:07 PM, Christopher Samuel wrote: > > On 6/6/23 1:33 pm, Heinz, Michael wrote: > >> I've gone through the man pages for slurm.conf but I can't find anything

[slurm-users] Billing/accounting for MIGs is not working

2023-06-06 Thread Richard Lefebvre
We have MIG defined and being used. But the billing for which MIG is used dean't seem to work. I have in the partitions the slurm.conf with something like below for TRESBilllings: TRESBillingWeights=CPU=1,Mem=1G,GRES/gpu:3g.20gb=0.375,GRES/gpu:4g.20gb=0.5,GRES/gpu=1.0 Yet, when I do sacct -j I d