You can probably have a job submit lua script that looks at the --gpus flag
(and maybe the --gres=gpu:* flag as well) and force a GPU type. A bit
complicated, and not sure if it will catch srun submissions. I don't think
this is flexible enough to ensure they get the least powerful GPU among all
Hello,
Apologies if this is in the docs but I couldn't find it anywhere.
I've been using Slurm to run a small 7-node cluster in a research lab for a
couple of years now (I'm a PhD student). A couple of our nodes have
heterogenous GPU models. One in particular has quite a few: 2x NVIDIA A100s,
Hi,
Does anyone know if there are any LSF wrappers ( bsub, bjobs , bkill etc )
that can work in Slurm ?
What I found so far is table that convert LSF command to Slurm command.
Any info will be appreciated
Thanks,
Amir
Hi,
I am looking at parsing some data and submitting lots of jobs to SLURM
and was wondering if there is a way to describe all the jobs and their
dependencies in some JSON file and submit that JSON file instead of making
individual calls to SLURM ?
Cheers
--
Nicholas Yue
https://www.linkedin.c
Hi Thomas,
I think the Slurm power_save is not problematic for us with bare-metal
on-premise nodes, in contrast to the problems you're having.
We use power_save with on-premise nodes where we control the power down/up
by means of IPMI commands as provided in the scripts which you will find
i
Am Wed, 29 Mar 2023 14:42:33 +0200
schrieb Ben Polman :
> I'd be interested in your kludge, we face a similar situation where the
> slurmctld node
> does not have access to the ipmi network and can not ssh to machines
> that have access.
> We are thinking on creating a rest interface to a contro
I'd be interested in your kludge, we face a similar situation where the
slurmctld node
does not have access to the ipmi network and can not ssh to machines
that have access.
We are thinking on creating a rest interface to a control server which
would be running the ipmi commands
Ben
On 29-
Am Mon, 27 Mar 2023 13:17:01 +0200
schrieb Ole Holm Nielsen :
> FYI: Slurm power_save works very well for us without the issues that you
> describe below. We run Slurm 22.05.8, what's your version?
I'm sure that there are setups where it works nicely;-) For us, it
didn't, and I was faced with h
Hello,
On 29.03.23 10:08, René Sitt wrote:
While the cited procedure works great in general, it gets more
complicated for heterogeneous setups
, i.e. if you have several GPU types
defined in gres.conf, since the 'tres_per_' fields can then take the
form of either 'gres:gpu:N' or 'gres:gpu::N'
Hello,
maybe some additional notes:
While the cited procedure works great in general, it gets more
complicated for heterogeneous setups, i.e. if you have several GPU types
defined in gres.conf, since the 'tres_per_' fields can then take the
form of either 'gres:gpu:N' or 'gres:gpu::N' - depen
Hi Frank,
use Features on the nodes, every cpu node gets e.g. "cpu", every gpu
node e.g. "gpu".
If a job asks for no gpus, set an additional constraint "cpu" for the job.
Best
Marcus
Am 29.03.2023 um 01:24 schrieb Frank Pari:
Well, I wanted to avoid using lua. But, it looks like that's goi
Hi,
We have a dedicated partitions for GPUs (their name ends with _gpu) and simply
forbid a job that is not requesting GPU resources to use this partition:
local function job_total_gpus(job_desc)
-- return total number of GPUs allocated to the job
-- there are many ways to request a GPU
12 matches
Mail list logo