Hi,
We are testing the MIG deployment on our new slurm compute node with 4 x
H100 GPUs. It looks like everything is configured correctly but we have a
problem accessing mig devices. When I submit jobs requesting a mig gpu
device #SBATCH --gres=gpu:H100_1g.10gb:1, the jobs get submitted to the
node,
Hi,
What are potential bad side effects of using a large/larger MessageTimeout?
And is there a value at which this setting is too large (long)?
Thanks,
Herc
Hi Ümit, Troy,
I removed the line “#SBATCH --gres=gpu:1”, and changed the sbatch directive
“--gpus-per-node=4” to “--gpus-per-node=1”, but still getting the same result:
When running multiple sbatch commands for the same script, only one job (first
execution) is running, and all subsequent jobs
On 1/18/24 17:42, Felix wrote:
I started a new AMD node, and the error is as follows:
"CPU frequency setting not configured for this node"
extended looks like this:
[2024-01-18T18:28:06.682] CPU frequency setting not configured for this node
[2024-01-18T18:28:06.691] slurmd started on Thu, 18
Hello
I started a new AMD node, and the error is as follows:
"CPU frequency setting not configured for this node"
extended looks like this:
[2024-01-18T18:28:06.682] CPU frequency setting not configured for this node
[2024-01-18T18:28:06.691] slurmd started on Thu, 18 Jan 2024 18:28:06 +0200
[
when they developed MPS, so I
guess our pattern may not be typical (or at least not universal), and in that
case the MPS plugin may well be what you need.
-- next part --
An HTML attachment was scrubbed...
URL:
<http://lists.schedmd.com/pipermail/slurm-users/a
Hi Hafedh,
Your job script has the sbatch directive “—gpus-per-node=4” set. I suspect
that if you look at what’s allocated to the running job by doing “scontrol show
job ” and looking at the TRES field, it’s been allocated 4 GPUs instead
of one.
Regards,
--Troy
From: slurm-us
This line also has tobe changed:
#SBATCH --gpus-per-node=4 • #SBATCH --gpus-per-node=1
--gpus-per-node seems to be the new parameter that is replacing the --gres=
one, so you can remove the –gres line completely.
Best
Ümit
From: slurm-users on behalf of
Kherfani, Hafedh (Professional Servi
Hi Noam and Matthias,
Thanks both for your answers.
I changed the "#SBATCH --gres=gpu:4" directive (in the batch script) with
"#SBATCH --gres=gpu:1" as you suggested, but it didn't make a difference, as
running this batch script 3 times will result in the first job to be in a
running state, wh
On Jan 18, 2024, at 7:31 AM, Matthias Loose wrote:
Hi Hafedh,
Im no expert in the GPU side of SLURM, but looking at you current configuration
to me its working as intended at the moment. You have defined 4 GPUs and start
multiple jobs each consuming 4 GPUs each. So the jobs wait for the ressou
Hi Hafedh,
Im no expert in the GPU side of SLURM, but looking at you current
configuration to me its working as intended at the moment. You have
defined 4 GPUs and start multiple jobs each consuming 4 GPUs each. So
the jobs wait for the ressource the be free again.
I think what you need to l
Hello Experts,
I'm a new Slurm user (so please bare with me :) ...).
Recently we've deployed Slurm version 23.11 on a very simple cluster, which
consists of a Master node (acting as a Login & Slurmdbd node as well), a
Compute Node which has a NVIDIA HGX A100-SXM4-40GB GPU, detected as 4 x GPU's
Can you not also do this with a single configuration file but configuring
multiple clusters which the user can choose with the -M option? I suppose it
depends on the use case; if you want to be able to choose a dev cluster over
the production one, to test new config options, then the environmen
Hi Christine,
yes, you can either set the environment variable SLURM_CONF to the full
path of the configuration-file you want to use and then run any program.
Or you can do it like this
SLURM_CONF=/your/path/to/slurm.conf sinfo|sbatch|srun|...
But I am not quite sure if this is really the be
LEROY Christine 208562 writes:
> Is there an env variable in SLURM to tell where the slurm.conf is?
> We would like to have on the same client node, 2 type of possible submissions
> to address 2 different cluster.
According to man sbatch:
SLURM_CONFThe location of the Slurm
Hello all,
Is there an env variable in SLURM to tell where the slurm.conf is?
We would like to have on the same client node, 2 type of possible submissions
to address 2 different cluster.
Thanks in advance,
Christine
Hi Wirawan,
in general `--gres=gpu:6´ actually means six units of a generic resource named
`gpu´
per node. Each unit may or may not be associated with a physical GPU device.
I'd check the node configuration for the number of gres=gpu resource units that
are
configured for that node.
scont
17 matches
Mail list logo