Hi,
We have slurm 21.08.6 and GPUs in our compute nodes. We want to restrict /
disable the use of "exclusive" flag in srun for users. How should we do it?
--
Thanks and regards,
PVD
For assimilation and dissemination of knowledge, visit cakes.cdac.in
---
If you want to have the same number of processes per node, like:
#PBS -l nodes=4:ppn=8
then what I am doing (maybe there is another way?) is:
#SBATCH --ntasks-per-node=8
#SBATCH --nodes=4
#SBATCH --mincpus=8
This is because "--ntasks-per-node" is actually "maximum number of tasks
per node" and
Thank you! We recently converted from pbs, and I was converting “ppn=X” to
“-n X”. Does it make more sense to convert “ppn=X” to --“cpus-per-task=X”?
Thanks again
David
On Thu, Mar 24, 2022 at 3:54 PM Thomas M. Payerle wrote:
> Although all three cases ( "-N 1 --cpus-per-task 64 -n 1", "-N 1
...it is a bit arcane, but it's not like we're funding lavish
lifestyles with our support payments. I would prefer to see a slightly more
differentiated support system, but this suffices...
On Thu, Mar 24, 2022 at 6:06 PM Sean Crosby wrote:
> Hi Jeff,
>
> The support system is here - https://bug
Hi Jeff,
The support system is here - https://bugs.schedmd.com/
Create an account, log in, and when creating a request, select your site from
the Site selection box.
Sean
From: slurm-users on behalf of Jeffrey
R. Lang
Sent: Friday, 25 March 2022 08:48
To: slu
Jeff,
I will reach out to you directly.
-Jason
On Thu, Mar 24, 2022 at 3:51 PM Jeffrey R. Lang wrote:
> Can someone provide me with instructions on how to open a support case
> with SchedMD?
>
>
>
> We have a support contract, but no where on their website can I find a
> link to open a case w
Can someone provide me with instructions on how to open a support case with
SchedMD?
We have a support contract, but no where on their website can I find a link to
open a case with them.
Thanks,
Jeff
Although all three cases ( "-N 1 --cpus-per-task 64 -n 1", "-N 1
--cpus-per-task 1 -n 64", and "-N 1 --cpus-per-task 32 -n 2") will cause
Slurm to allocate 64 cores to the job, there can (and will) be differences
in the other respects.
The variable SLURM_NTASKS will be set to the argument of the -
My site recently updated to Slurm 21.08.6 and for the most part everything went
fine. Two Ubuntu nodes however are having issues.Slurmd cannot execve the
jobs on the nodes. As an example:
[jrlang@tmgt1 ~]$ salloc -A ARCC --nodes=1 --ntasks=20 -t 1:00:00 --bell
--nodelist=mdgx01 --partitio
“ Will launch 64 instances of your application, each bound to a single cpu”
This is true for srun, but not for sbatch.
A while back, we did an experiment using “hostname” to verify.
On Thu, Mar 24, 2022 at 12:47 PM Ralph Castain wrote:
> Well, there is indeed a difference - and it is significa
Well, there is indeed a difference - and it is significant.
> On Mar 24, 2022, at 12:32 PM, David Henkemeyer
> wrote:
>
> Assuming -N is 1 (meaning, this job needs only one node), then is there a
> difference between any of these 3 flag combinations:
>
> -n 64 (leaving cpus-per-task to be the
Assuming -N is 1 (meaning, this job needs only one node), then is there a
difference between any of these 3 flag combinations:
-n 64 (leaving cpus-per-task to be the default of 1)
--cpus-per-task 64 (leaving -n to be the default of 1)
--cpus-per-task 32 -n 2
As far as I can tell, there is no fun
Here is an example command for getting parseable output from sacct of all
completed jobs during a specific period of time:
$ sacct -p -X -a -S 032322 -E 032422 -o JobID,User,State -s ca,cd,f,to,pr,oom
The fields are separated by | and can easily be parsed by awk.
Example output:
JobID|User|St
Hi Durai,
I see the same thing as you on our test-cluster that has
ThreadsPerCore=2
configured in slurm.conf
The double-foo goes away with this:
srun --cpus-per-task=1 --hint=nomultithread echo foo
Having multithreading enabled leads to imho surprising behaviour of
Slurm. My impression is that
I don't think that is part of sacct options. Feature request maybe.
Meanwhile, awk would be your friend here. Just post-process by piping
the output to awk and doing the substitutions before printing the output.
eg:
sacct |awk '{sub("CANCELLED","CA");sub("RUNNING","RU");print}'
Just add
Hi Chip,
Use the sacct -p or --parsable option to get the complete output delimited
by |
/Ole
On 3/24/22 14:12, Chip Seraphine wrote:
I’m trying to shave a few columns off the output of some sacct output, and
while it will happily accept the short codes (e.g. CA instead of CANCELLED) I
ca
I’m trying to shave a few columns off the output of some sacct output, and
while it will happily accept the short codes (e.g. CA instead of CANCELLED) I
can’t find a way to get it to report them. Shaving down the columns using %N
in –format just results in a truncated version of the long code,
Hello Slurm users,
We are experiencing strange behavior with srun executing commands twice
only when setting --cpus-per-task=1
$ srun --cpus-per-task=1 --partition=gpu-2080ti echo foo
srun: job 1298286 queued and waiting for resources
srun: job 1298286 has been allocated resources
foo
foo
This i
cgroups can control access to devices (e.g. /dev/nvidia0), which is how I
understand it to work.
-Sean
On Thu, Mar 24, 2022 at 4:27 AM wrote:
> Well, this is indeed the point. We didn’t set *ConstrainDevices=yes *in
> cgroup.conf. After adding this, gpu restriction works as expected.
>
> But wh
Well, this is indeed the point. We didn’t set ConstrainDevices=yes in
cgroup.conf. After adding this, gpu restriction works as expected.
But what is the relation between gpu restriction and cgroup? I never heard that
cgroup can limit gpu card usage. Isn’t it a feature of cuda or nvidia driver?
20 matches
Mail list logo