Tested slurm-23.* version, AllowAccounts parameter does not work.
daijiangkuicgo--- via slurm-users
于2024年6月29日周六 16:30写道:
> AllowGroups is ok.
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>
--
slurm-us
tested slurm-23.* version, AllowAccounts parameter does not work.
Carsten Beyer via slurm-users 于2024年7月2日周二
15:54写道:
> Hi Christine,
>
> we don't use AllowGroups but have AllowAccounts, which is not working
> anymore as expected in version 23.02.x. We tried also with
> EnforcePartLimits.
>
> I
Looks like there is a step you would need to do to create the required
job mapping files:
/The DCGM-exporter can include High-Performance Computing (HPC) job
information into its metric labels. To achieve this, HPC environment
administrators must configure their HPC environment to generate fil
> I know you can show job info and find what dependency a job is waiting
> for, But more after checking if there are jobs waiting on the current
> job to complete using the job ID,
You mean you don't wanna like
squeue -o%i,%E | grep SOME_JOBID
?
Although I guess that won't catch a matching `s
thanks, that was the bug, now it works
On 16.10.24 15:25, Groner, Rob wrote:
Maybe I'm reading it wrong, but your partition sets DefMemPerGPU at
32000 and the nodes only have 31000 real memory available.
Rob
*From:* Jör
Maybe I'm reading it wrong, but your partition sets DefMemPerGPU at 32000 and
the nodes only have 31000 real memory available.
Rob
From: Jörg Striewski via slurm-users
Sent: Wednesday, October 16, 2024 4:05 AM
To: slurm-users@lists.schedmd.com
Subject: [slurm-u
Hey guys !
I'm looking to improve GPU monitoring on our cluster. I want to install
this https://github.com/NVIDIA/dcgm-exporter and saw in the README that
it can support tracking of job id :
https://github.com/NVIDIA/dcgm-exporter?tab=readme-ov-file#enabling-hpc-job-mapping-on-dcgm-exporter
Hi Slurm Users,
I am trying to figure out if there is a way you can check if a running job has
any jobs queued up after it that depend on the current running job,
I know you can show job info and find what dependency a job is waiting for,
But more after checking if there are jobs waiting on the
i cannot send jobs to nodes with one gpu, i don't find the bug in my
configuration. can someone help me ?
in slurm.conf GresTypes=gpu is set
this are some nodes in slurm.conf
NodeName=gpu-[001-003] CPUs=8 SocketsPerBoard=1
CoresPerSocket=4 RealMemory=31000 Gres=gpu:1080:1
NodeN