[slurm-users] Re: Why AllowAccounts not work in slurm-23.11.6

2024-10-16 Thread shaobo liu via slurm-users
Tested slurm-23.* version, AllowAccounts parameter does not work. daijiangkuicgo--- via slurm-users 于2024年6月29日周六 16:30写道: > AllowGroups is ok. > > -- > slurm-users mailing list -- slurm-users@lists.schedmd.com > To unsubscribe send an email to slurm-users-le...@lists.schedmd.com > -- slurm-us

[slurm-users] Re: AllowAccounts partition setting

2024-10-16 Thread shaobo liu via slurm-users
tested slurm-23.* version, AllowAccounts parameter does not work. Carsten Beyer via slurm-users 于2024年7月2日周二 15:54写道: > Hi Christine, > > we don't use AllowGroups but have AllowAccounts, which is not working > anymore as expected in version 23.02.x. We tried also with > EnforcePartLimits. > > I

[slurm-users] Re: How do you guys track which GPU is used by which job ?

2024-10-16 Thread Brian Andrus via slurm-users
Looks like there is a step you would need to do to create the required job mapping files: /The DCGM-exporter can include High-Performance Computing (HPC) job information into its metric labels. To achieve this, HPC environment administrators must configure their HPC environment to generate fil

[slurm-users] Re: Dependency jobs

2024-10-16 Thread Laura Hild via slurm-users
> I know you can show job info and find what dependency a job is waiting > for, But more after checking if there are jobs waiting on the current > job to complete using the job ID, You mean you don't wanna like squeue -o%i,%E | grep SOME_JOBID ? Although I guess that won't catch a matching `s

[slurm-users] Re: Problem with nodes with 1 gpu

2024-10-16 Thread Jörg Striewski via slurm-users
thanks, that was the bug, now it works On 16.10.24 15:25, Groner, Rob wrote: Maybe I'm reading it wrong, but your partition sets DefMemPerGPU at 32000 and the nodes only have 31000 real memory available. Rob *From:* Jör

[slurm-users] Re: Problem with nodes with 1 gpu

2024-10-16 Thread Groner, Rob via slurm-users
Maybe I'm reading it wrong, but your partition sets DefMemPerGPU at 32000 and the nodes only have 31000 real memory available. Rob From: Jörg Striewski via slurm-users Sent: Wednesday, October 16, 2024 4:05 AM To: slurm-users@lists.schedmd.com Subject: [slurm-u

[slurm-users] How do you guys track which GPU is used by which job ?

2024-10-16 Thread Sylvain MARET via slurm-users
Hey guys ! I'm looking to improve GPU monitoring on our cluster. I want to install this https://github.com/NVIDIA/dcgm-exporter and saw in the README that it can support tracking of job id : https://github.com/NVIDIA/dcgm-exporter?tab=readme-ov-file#enabling-hpc-job-mapping-on-dcgm-exporter

[slurm-users] Dependency jobs

2024-10-16 Thread adam--- via slurm-users
Hi Slurm Users, I am trying to figure out if there is a way you can check if a running job has any jobs queued up after it that depend on the current running job, I know you can show job info and find what dependency a job is waiting for, But more after checking if there are jobs waiting on the

[slurm-users] Problem with nodes with 1 gpu

2024-10-16 Thread Jörg Striewski via slurm-users
i cannot send jobs to nodes with one gpu, i don't find the bug in my configuration. can someone help me ? in slurm.conf    GresTypes=gpu is set this are some nodes in slurm.conf NodeName=gpu-[001-003]  CPUs=8    SocketsPerBoard=1 CoresPerSocket=4   RealMemory=31000   Gres=gpu:1080:1 NodeN