Thanks, Herman, for the feedback.
My reason for posting was to request some inspection of the systemd file for
slurmd such that this "nudging" would not be necessary.
I'd like to explore that a little more -- it looks like cgroupsv2 cpusets are
working for us in this configuration, except for h
I haven't seen anything that allows for disabling a defined Gres device. It
does seem to work if I define the GPUs that I don't want to use and then
specifically submit jobs to the other GPUs using --gpu like
"--gpu=gpu:rtx_2080_ti:1". I suppose if I set the GPU Type to be "COMPUTE" for
the GPU
It's not so much whether a job may or may not access the GPU but rather which
GPU(s) is(are) included in $CUDA_VISIBLE_DEVICES. That is what controls what
our CUDA jobs can see and therefore use (within any cgroups constraints, of
course). In my case, Slurm is sometimes setting $CUDA_VISIBLE_DEV
Very interesting issue.
I am guessing there might be a workaround: SInce oryx has 2 gpus
instead, you can define both of them, but disable the GT 710? Does
Slurm support this?
Best,
Feng
Best,
Feng
On Tue, Jun 27, 2023 at 9:54 AM Wilson, Steven M wrote:
>
> Hi,
>
> I manually configure the
On 7/14/23 10:20 am, Wilson, Steven M wrote:
I upgraded Slurm to 23.02.3 but I'm still running into the same problem.
Unconfigured GPUs (those absent from gres.conf and slurm.conf) are still
being made available to jobs so we end up with compute jobs being run on
GPUs which should only be used
I upgraded Slurm to 23.02.3 but I'm still running into the same problem.
Unconfigured GPUs (those absent from gres.conf and slurm.conf) are still being
made available to jobs so we end up with compute jobs being run on GPUs which
should only be used
Any ideas?
Thanks,
Steve
___