i cannot send jobs to nodes with one gpu, i don't find the bug in my
configuration. can someone help me ?
in slurm.conf GresTypes=gpu is set
this are some nodes in slurm.conf
NodeName=gpu-[001-003] CPUs=8 SocketsPerBoard=1
CoresPerSocket=4 RealMemory=31000 Gres=gpu:1080:1
NodeN
Hi Slurm Users,
I am trying to figure out if there is a way you can check if a running job has
any jobs queued up after it that depend on the current running job,
I know you can show job info and find what dependency a job is waiting for,
But more after checking if there are jobs waiting on the
> I know you can show job info and find what dependency a job is waiting
> for, But more after checking if there are jobs waiting on the current
> job to complete using the job ID,
You mean you don't wanna like
squeue -o%i,%E | grep SOME_JOBID
?
Although I guess that won't catch a matching `s
Hey guys !
I'm looking to improve GPU monitoring on our cluster. I want to install
this https://github.com/NVIDIA/dcgm-exporter and saw in the README that
it can support tracking of job id :
https://github.com/NVIDIA/dcgm-exporter?tab=readme-ov-file#enabling-hpc-job-mapping-on-dcgm-exporter
Maybe I'm reading it wrong, but your partition sets DefMemPerGPU at 32000 and
the nodes only have 31000 real memory available.
Rob
From: Jörg Striewski via slurm-users
Sent: Wednesday, October 16, 2024 4:05 AM
To: slurm-users@lists.schedmd.com
Subject: [slurm-u
Looks like there is a step you would need to do to create the required
job mapping files:
/The DCGM-exporter can include High-Performance Computing (HPC) job
information into its metric labels. To achieve this, HPC environment
administrators must configure their HPC environment to generate fil
thanks, that was the bug, now it works
On 16.10.24 15:25, Groner, Rob wrote:
Maybe I'm reading it wrong, but your partition sets DefMemPerGPU at
32000 and the nodes only have 31000 real memory available.
Rob
*From:* Jör
tested slurm-23.* version, AllowAccounts parameter does not work.
Carsten Beyer via slurm-users 于2024年7月2日周二
15:54写道:
> Hi Christine,
>
> we don't use AllowGroups but have AllowAccounts, which is not working
> anymore as expected in version 23.02.x. We tried also with
> EnforcePartLimits.
>
> I
Tested slurm-23.* version, AllowAccounts parameter does not work.
daijiangkuicgo--- via slurm-users
于2024年6月29日周六 16:30写道:
> AllowGroups is ok.
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>
--
slurm-us