Gaussian? Look for NProc=8 or similar lines (NPRocShared, could be other
options, too) in their input files. There could also be some system-wide
parallel settings for Gaussian, but that wouldn’t be the default.
> On Jul 10, 2018, at 2:04 PM, Mahmood Naderan wrote:
>
> Hi,
> I see that althoug
Looking at your script, there’s a chance that by only specifying ntasks instead
of ntasks-per-node or a similar parameter, you might have allocated 8 CPUs on
one node, and the remaining 4 on another.
Regardless, I’ve dug into my Gaussian documentation, and here’s my test case
for you to see wha
You’re getting the same fundamental error in both the interactive and batch
version, though.
The ‘reinit: Reading from standard input’ line seemed off, since you were
providing an argument for the input file. But all the references I find to
running Siesta in their manual (section 3 and section
Chris’ method will set CUDA_VISIBLE_DEVICES like you’re used to, and it will
help keep you or your users from picking conflicting devices.
My cgroup/GPU settings from slurm.conf:
=
[renfro@login ~]$ egrep -i '(cgroup|gpu)' /etc/slurm/slurm.conf | grep -v '^#'
ProctrackType=proctrack/cgroup
Hey, folks. I’ve got a Slurm 17.02 cluster (RPMs provided by Bright Computing,
if it matters) with both gigabit Ethernet and Infiniband interfaces. Twice in
the last year, I’ve had a failure inside the stacked Ethernet switches that’s
caused Slurm to lose track of node and job state. Jobs kept r
Depending on the scale (what percent are Fluent users, how many nodes you
have), you could use exclusive mode on either a per-partition or per-job basis.
Here, my (currently few) Fluent users do all their GUI work off the cluster,
and just submit batch jobs using the generated case and data file
A 'nice -n 19' process will still consume 100% of the CPU if nothing else is
going on.
‘top’ output from a dual-core system with 3 ‘dd’ processes -- 2 with default
nice value of 0, and 1 with a nice value of 19:
=
PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND
We have multiple partitions using the same nodes. The interactive partition is
high priority and limited on time and resources. The batch partition is low
priority and has looser time and resource restrictions.
And we have a shell function that calls srun —partition=interactive —pty $SHELL
to m
@gmail.com>> wrote:
Thanks for your response Mike. I have a follow-up question for this approach.
How do you restrict someone to start an interactive session on the "batch"
partition?
On Wed, Sep 19, 2018 at 12:50 PM Renfro, Michael
mailto:ren...@tntech.edu>> wrote:
We ha
rtitions? Currently
> they use
>
> srun -n 1 -c 6 --x11 -A monthly -p CAT --mem=32GB ./fluent.sh
>
> where fluent.sh is
>
> #!/bin/bash
> unset SLURM_GTIDS
> /state/partition1/ansys_inc/v140/fluent/bin/fluent
>
>
> Regards,
> Mahmood
>
>
>
>
If your workflows are primarily CPU-bound rather than memory-bound, and since
you’re the only user, you could ensure all your Slurm scripts ‘nice’ their
Python commands, or use the -n flag for slurmd and the PropagatePrioProcess
configuration parameter. Both of these are in the thread at
https:
Anecdotally, I’ve had a user cause load averages of 10x the node’s core count.
The user caught it and cancelled the job before I noticed it myself. Where I’ve
seen it happen live on less severe cases, I’ve never noticed anything other
than the excessive load average. Viewed from ‘top’, the offen
A reservation overlapping with times you have the node in drain?
Drain and reserve:
# scontrol update nodename=node[037] state=drain reason=“testing"
# scontrol create reservation users=renfro reservationname='drain_test'
nodes=node[037] starttime=2018-10-05T08:17:00 endtime=2018-10-05T09:00:00
Hey, folks. Been working on a job submit filter to let us use otherwise idle
cores in our GPU nodes.
We’ve got 40 non-GPU nodes and 4 GPU nodes deployed, each has 28 cores. We’ve
had a set of partitions for the non-GPU nodes (batch, interactive, and debug),
and another set of partitions for the
>From https://stackoverflow.com/a/46176694:
>> I had the same requirement to force users to specify accounts and, after
>> finding several ways to fulfill it with slurm, I decided to revive this post
>> with the shortest/easiest solution.
>>
>> The slurm lua submit plugin sees the job descripti
What does scontrol show partition EMERALD give you? I’m assuming its
AllowAccounts output won’t match your /etc/slurm/parts settings.
> On Dec 2, 2018, at 12:34 AM, Mahmood Naderan wrote:
>
> Hi
> Although I have created an account and associated that to a partition, but
> the submitted job re
For the simpler questions (for the overall job step, not real-time), you can
'sacct --format=all’ to get data on completed jobs, and then:
- compare the MaxRSS column to the ReqMem column to see how far off their
memory request was
- compare the TotalCPU column to the product of the NCPUS and El
Literal job arrays are built into Slurm:
https://slurm.schedmd.com/job_array.html
Alternatively, if you wanted to allocate a set of CPUs for a parallel task, and
then run a set of single-CPU tasks in the same job, something like:
#!/bin/bash
#SBATCH --ntasks=30
srun --ntasks=${SLURM_NTASK
Not sure what the reasons behind “have to manually ssh to a node”, but salloc
and srun can be used to allocate resources and run commands on the allocated
resources:
Before allocation, regular commands run locally, and no Slurm-related variables
are present:
=
[renfro@login ~]$ hostname
l
Those errors appear to pop up when qemu can’t find enough RAM to run. If the
#SBATCH lines are only applicable for ‘sbatch' and not ‘srun' or ‘salloc', the
‘--mem=8G' setting there doesn’t affect anything.
- Does the srun version of the command work if you specify 'qemu-system-x86_64
-m 2048' o
Hey, folks. Running 17.02.10 with Bright Cluster Manager 8.0.
I wanted to limit queue-stuffing on my GPU nodes, similar to what
AssocGrpCPURunMinutesLimit does. The current goal is to restrict a user to
having 8 active or queued jobs in the production GPU partition, and block (not
reject) other
)
150677 gpu omp_hw.sh renfro R 0:06 11 4000M
gpunode001 (null)
$ scancel -u $USER -p gpu
> On Jan 25, 2019, at 10:35 AM, Renfro, Michael wrote:
>
> Hey, folks. Running 17.02.10 with Bright Cluster Manager 8.0.
>
> I wanted to limit queue-stu
In case you haven’t already done something similar, I reduced some of the
cumbersome-ness of my job_submit.lua by breaking it out into subsidiary
functions, and adding some logic to detect if I was in test mode or not. Basic
structure, with subsidiary functions defined ahead of slurm_job_submit(
I’m assuming you have LDAP and Slurm already working on all your nodes, and
want to restrict access to two of the nodes based off of Unix group membership,
while letting all users access the rest of the nodes.
If that’s the case, you should be able to put the two towers into a separate
partitio
If you’re literally putting spaces around the ‘=‘ character, I don’t think
that’s valid shell syntax, and should throw errors into your slurm-JOBID.out
file when you try it.
See if it works with A=1.0 instead of A = 1.0
> On Feb 18, 2019, at 7:55 AM, Castellana Michele
> wrote:
>
> External
If the failures happen right after the job starts (or close enough), I’d use an
interactive session with srun (or some other wrapper that calls srun, such as
fisbatch).
Our hpcshell wrapper for srun is just a bash function:
=
hpcshell ()
{
srun --partition=interactive $@ --pty bash -i
I think all you’re looking for is Generic Resource (GRES) scheduling, starting
at https://slurm.schedmd.com/gres.html — if you’ve already seen that, then more
details would be helpful.
If it all works correctly, then ‘sbatch --gres=gpu scriptname’ should run up to
4 of those jobs and leave the
Can a second user allocate anything on node fl01 after the first user requests
their 12 tasks per node? If not, then it looks like tasks are being tied to
physical cores, and not a hyperthreaded version of a core.
--
Mike Renfro, PhD / HPC Systems Administrator, Information Technology Services
We put a ‘gpu’ QOS on all our GPU partitions, and limit jobs per user to 8 (our
GPU capacity) via MaxJobsPerUser. Extra jobs get blocked, allowing other users
to queue jobs ahead of the extras.
# sacctmgr show qos gpu format=name,maxjobspu
Name MaxJobsPU
-- -
gpu
Should be set on your NodeName lines in slurm.conf. For a 256 GB node, I’ve got:
NodeName=node038 CoresPerSocket=14 RealMemory=254000 Sockets=2
ThreadsPerCore=1
so that users can’t reserve every bit of physical memory, leaving a small
amount for OS operation.
> On May 16, 2019, at 3:47 PM,
Is this output file being written to a central file server that can be accessed
from your submit host? If so, start another ssh session from your local
computer to the submit host.
Is the output file being written to a location only accessible from the compute
node running your job? You might b
ntasks=N as an argument to sbatch or srun? Should work as long as you don’t
have exclusive node settings. From our setup:
[renfro@login ~]$ hpcshell --ntasks=16 # hpcshell is a shell function for 'srun
--partition=interactive $@ --pty bash -i'
[renfro@gpunode001(job 202002) ~]$ srun hostname | s
MATLAB container at NVIDIA’s NGC:
https://ngc.nvidia.com/catalog/containers/partners:matlab
Should be compatible with Docker and Singularity, but read the fine print on
licensing.
> On Sep 19, 2019, at 8:22 AM, Thomas M. Payerle wrote:
>
> While I agree containers can be quite useful in HPC e
Never used Rocks, but as far as Slurm or anything else is concerned,
Singularity is just another program. It will need to be accessible from any
compute nodes you want to use it on (whether that’s from OS-installed packages,
from a shared NFS area, or whatever shouldn’t matter).
So your user wi
DMTCP might be an option? Pretty sure there are RPMs for it in RHEL/CentOS 7.
Don’t recall it being any trouble to install.
http://dmtcp.sourceforge.net/
On Oct 4, 2019, at 9:47 PM, Eliot Moss
mailto:m...@cs.umass.edu>> wrote:
Dear slurm users --
I'm new to slurm (somewhat experienced with Gr
Our cgroup settings are quite a bit different, and we don’t allow jobs to swap,
but the following works to limit memory here (I know, because I get emails
frequent emails from users who don’t change their jobs from the default 2 GB
per CPU that we use):
CgroupMountpoint="/sys/fs/cgroup"
CgroupA
Pretty sure you don’t need to explicitly specify GPU IDs on a Gromacs job
running inside of Slurm with gres=gpu. Gromacs should only see the GPUs you
have reserved for that job.
Here’s a verification code you can run to verify that two different GPU jobs
see different GPU devices (compile with
> • Total number of jobs submitted by user (daily/weekly/monthly)
> • Average queue time per user (daily/weekly/monthly)
> • Average job run time per user (daily/weekly/monthly)
Open XDMoD for these three. https://github.com/ubccr/xdmod , plus
https://xdmod.ccr.buffalo.edu (unfo
D this morning while
"searching" for further info...
Would Grafana do similar job as XDMoD?
-Original Message-
From: slurm-users
mailto:slurm-users-boun...@lists.schedmd.com>>
On Behalf Of Renfro, Michael
Sent: 26 November 2019 16:14
To: Slurm User Community Li
We’ve been using that weighting scheme for a year or so, and it works as
expected. Not sure how Slurm would react to multiple NodeName=DEFAULT lines
like you have, but here’s our node settings and a subset of our partition
settings.
In our environment, we’d often have lots of idle cores on GPU
What do you get from
systemctl status slurmdbd
systemctl status slurmctld
I’m assuming at least slurmdbd isn’t running.
> On Dec 10, 2019, at 3:05 PM, Dean Schulze wrote:
>
> External Email Warning
> This email originated from outside the university. Please use caution when
> opening attachme
Snapshot of a job_submit.lua we use to automatically to route jobs to a GPU
partition if the user asks for a GPU:
https://gist.github.com/mikerenfro/92d70562f9bb3f721ad1b221a1356de5
All our users just use srun or sbatch with a default queue, and the plugin
handles it from there. There’s more de
Hey, folks. I’ve just upgraded from Slurm 17.02 (way behind schedule, I know)
to 19.05. The only thing I’ve noticed going wrong is that my user resource
limits aren’t being applied correctly.
My typical user has a GrpTRESRunMin limit of cpu=144 (1000 CPU-days), and
after the upgrade, it app
tool prints nicely user limits from the Slurm database:
> https://github.com/OleHolmNielsen/Slurm_tools/tree/master/showuserlimits
>
> Maybe this can give you further insights into the source of problems.
>
> /Ole
>
> On 16-12-2019 17:27, Renfro, Michael wrote:
>> Hey,
Resolved now. On older versions of Slurm, I could have queues without default
times specified (just an upper limit, in my case). As of Slurm 18 or 19, I had
to add a default time to all my queues to avoid the AssocGrpCPURunMinutesLimit
flag.
> On Dec 16, 2019, at 2:00 PM, Renfro, Mich
My current batch queues have a 30-day limit, and I’ll likely be reducing that
to maybe 7 days for most users in the near future, as it will make priority and
fairshare mechanisms more responsive (even if a high-priority job gets bumped
to the top of the queue, it may still have to wait a few day
Hey, folks.
Some of my users submit job after job with no recognition of our 1000 CPU-day
TRES limit, and thus their later jobs get blocked with the reason
AssocGrpCPURunMinutesLimit.
I’ve written up a script [1] using Ole Holm Nielsen’s showuserlimits script [2]
that will identify a user’s sm
The slurm-web project [1] has a REST API [2]. Never used it myself, just used
the regular web frontend for viewing queue and node state.
[1] https://edf-hpc.github.io/slurm-web/index.html
[2] https://edf-hpc.github.io/slurm-web/api.html
> On Jan 24, 2020, at 1:22 PM, Dean Schulze wrote:
>
> Ex
For the first question: you should be able to define each node’s core count,
hyperthreading, or other details in slurm.conf. That would allow Slurm to
schedule (well-behaved) tasks to each node without anything getting overloaded.
For the second question about jobs that aren’t well-behaved (a jo
On this part, I don’t think that’s always the case. On a node with 384 GB (with
2 GB reserved for the OS), we’ve got several jobs running under mem=32000:
=
$ grep 'NodeName=gpunode\[00' /etc/slurm/slurm.conf
NodeName=gpunode[001-003] CoresPerSocket=14 RealMemory=382000 Sockets=2
ThreadsPe
ups is the solution I suppose.
>
> On Tue, Jan 28, 2020 at 7:42 PM Renfro, Michael wrote:
> For the first question: you should be able to define each node’s core count,
> hyperthreading, or other details in slurm.conf. That would allow Slurm to
> schedule (well-behaved) tas
Greetings, fellow general university resource administrator.
Couple things come to mind from my experience:
1) does your serial partition share nodes with the other non-serial partitions?
2) what’s your maximum job time allowed, for serial (if the previous answer was
“yes”) and non-serial parti
ot; the system. The larger jobs at the
> expense of the small fry for example, however that is a difficult decision
> that means that someone has got to wait longer for results..
>
> Best regards,
> David
> From: slurm-users on behalf of
> Renfro, Michael
> Sent
early
> release of v18.
>
> Best regards,
> David
>
> From: slurm-users on behalf of
> Renfro, Michael
> Sent: 31 January 2020 17:23:05
> To: Slurm User Community List
> Subject: Re: [slurm-users] Longer queuing times for larger jobs
>
> I missed reading w
If you want to rigidly define which 20 nodes are available to the one group of
users, you could define a 20-node partition for them, and a 35-node partition
for the priority group, and restrict access by Unix group membership:
PartitionName=restricted Nodes=node0[01-20] AllowGroups=ALL
Partition
Hey, Matthias. I’m having to translate a bit, so if I get a meaning wrong,
please correct me.
You should be able to set the minimum and maximum number of nodes used for jobs
on a per-partition basis, or to set a default for all partitions. My most
commonly used partition has:
PartitionName=b
If that 32 GB is main system RAM, and not GPU RAM, then yes. Since our GPU
nodes are over-provisioned in terms of both RAM and CPU, we end up using the
excess resources for non-GPU jobs.
If that 32 GB is GPU RAM, then I have no experience with that, but I suspect
MPS would be required.
> On Fe
When I made similar queues, and only wanted my GPU jobs to use up to 8 cores
per GPU, I set Cores=0-7 and 8-15 for each of the two GPU devices in gres.conf.
Have you tried reducing those values to Cores=0 and Cores=20?
> On Feb 27, 2020, at 9:51 PM, Pavel Vashchenkov wrote:
>
> External Email
We have a shared gres.conf that includes node names, which should have the
flexibility to specify node-specific settings for GPUs:
=
NodeName=gpunode00[1-4] Name=gpu Type=k80 File=/dev/nvidia0 COREs=0-7
NodeName=gpunode00[1-4] Name=gpu Type=k80 File=/dev/nvidia1 COREs=8-15
=
See the th
I’m going to guess the job directive changed between earlier releases and
20.02. An version of the page from last year [1] has no mention of hetjob, and
uses packjob instead.
On a related note, is there a canonical location for older versions of Slurm
documentation? My local man pages are alway
The release notes at https://slurm.schedmd.com/archive/slurm-19.05.5/news.html
indicate you can upgrade from 17.11 or 18.08 to 19.05. I didn’t find equivalent
release notes for 17.11.7, but upgrades over one major release should work.
> On Mar 11, 2020, at 2:01 PM, Will Dennis wrote:
>
> Exter
In addition to Sean’s recommendation, your user might want to use job arrays
[1]. That’s less stress on the scheduler, and throughput should be equivalent
to independent jobs.
[1] https://slurm.schedmd.com/job_array.html
--
Mike Renfro, PhD / HPC Systems Administrator, Information Technology S
Rather than configure it to only run one job at a time, you can use job
dependencies to make sure only one job of a particular type at a time. A
singleton dependency [1, 2] should work for this. From [1]:
#SBATCH --dependency=singleton --job-name=big-youtube-upload
in any job script would ens
Others might have more ideas, but anything I can think of would require a lot
of manual steps to avoid mutual interference with jobs in the other partitions
(allocating resources for a dummy job in the other partition, modifying the MPI
host list to include nodes in the other partition, etc.).
All of this is subject to scheduler configuration, but: what has job 409978
requested, in terms of resources and time? It looks like it's the highest
priority pending job in the interactive partition, and I’d expect the
interactive partition has a higher priority than the regress partition.
As
Unless I’m misreading it, you have a wall time limit of 2 days, and jobs that
use up to 32 CPUs. So a total CPU time of up to 64 CPU-days would be possible
for a single job.
So if you want total wall time for jobs instead of CPU time, then you’ll want
to use the Elapsed attribute, not CPUTime.
Can’t speak for everyone, but I went to Slurm 19.05 some months back, and
haven't had any problems with CUDA 10.0 or 10.1 (or 8.0, 9.0, or 9.1).
> On Apr 17, 2020, at 8:46 AM, Lisa Kay Weihl wrote:
>
> External Email Warning
>
> This email originated from outside the university. Please use cau
Someone else might see more than I do, but from what you’ve posted, it’s clear
that compute-0-0 will be used only after other lower-weighted nodes are too
full to accept a particular job.
I assume you’ve already submitted a set of jobs requesting enough resources to
fill up all the nodes, and t
That’s a *really* old version, but
https://slurm.schedmd.com/archive/slurm-15.08.13/sbatch.html indicates there’s
an exclusive flag you can set.
On Apr 29, 2020, at 1:54 PM, Rutger Vos wrote:
.
Hi,
for a smallish machine that has been having degraded performance we want to
implement a pol
#x27;d have to specify this when submitting, right? I.e. 'sbatch
> --exclusive myjob.sh', if I understand correctly. Would there be a way to
> simply enforce this, i.e. at the slurm.conf level or something?
>
> Thanks again!
>
> Rutger
>
> On Wed, Apr 29, 2020 at
Assuming you need a scheduler for whatever size your user population is, so
they need literal JupyterHub, or would they all be satisfied running regular
Jupyter notebooks?
On May 4, 2020, at 7:25 PM, Lisa Kay Weihl wrote:
External Email Warning
This email originated from outside the univer
Have you seen https://slurm.schedmd.com/licenses.html already? If the software
is just for use inside the cluster, one Licenses= line in slurm.conf plus users
submitting with the -L flag should suffice. Should be able to set that license
value is 4 if it’s licensed per node and you can run up to
ically updated the value based on usage?
>
>
> Regards
> Navin.
>
>
> On Tue, May 5, 2020 at 7:00 PM Renfro, Michael wrote:
> Have you seen https://slurm.schedmd.com/licenses.html already? If the
> software is just for use inside the cluster, one Licenses= line in s
Aside from any Slurm configuration, I’d recommend setting up a modules [1 or 2]
folder structure for CUDA and other third-party software. That handles
LD_LIBRARY_PATH and other similar variables, reduces the chances for library
conflicts, and lets users decide their environment on a per-job basi
specific
> nodes?
> i do not want to create a separate partition.
>
> is there any way to achieve this by any other method?
>
> Regards
> Navin.
>
>
> Regards
> Navin.
>
> On Tue, May 5, 2020 at 7:46 PM Renfro, Michael wrote:
> Haven’t done it yet
in this case]
>
> Regards
> Navin.
>
>
> On Wed, May 6, 2020 at 7:47 PM Renfro, Michael wrote:
> To make sure I’m reading this correctly, you have a software license that
> lets you run jobs on up to 4 nodes at once, regardless of how many CPUs you
> use? That is, y
There are MinNodes and MaxNodes settings that can be defined for each partition
listed in slurm.conf [1]. Set both to 1 and you should end up with the non-MPI
partitions you want.
[1] https://slurm.schedmd.com/slurm.conf.html
From: slurm-users on behalf of
Ho
Hey, folks. I've had a 1000 CPU-day (144 CPU-minutes) GrpTRESMins limit
applied to each user for years. It generally works as intended, but I have one
user I've noticed whose usage is highly inflated from reality, causing the
GrpTRESMins limit to be enforced much earlier than necessary:
squ
e user's limits are printed in detail by showuserlimits.
These tools are available from https://github.com/OleHolmNielsen/Slurm_tools
/Ole
On 08-05-2020 15:34, Renfro, Michael wrote:
> Hey, folks. I've had a 1000 CPU-day (144 CPU-minutes) GrpTRESMins
> limit applied to each
s that
already completed, but still get counted against the user's current requests.
From: Ole Holm Nielsen
Sent: Friday, May 8, 2020 9:27 AM
To: slurm-users@lists.schedmd.com
Cc: Renfro, Michael
Subject: Re: [slurm-users] scontrol show assoc_mgr showing m
f,to,pr"
> # Get Slurm individual job accounting records using the "sacct" command
> sacct $partitionselect -n -X -a -S $start_time -E $end_time -o $FORMAT
> -s $STATE
>
> There are numerous output fields which you can inquire, see "sacct -e".
>
> /Ole
>
restart.
Thanks.
> On May 8, 2020, at 11:47 AM, Renfro, Michael wrote:
>
> Working on something like that now. From an SQL export, I see 16 jobs from
> my user that have a state of 7. Both states 3 and 7 show up as COMPLETED in
> sacct, and may also have some duplicate job en
I’d compare the RealMemory part of ’scontrol show node
abhi-HP-EliteBook-840-G2’ to the RealMemory part of your slurm.conf:
> Nodes which register to the system with less than the configured resources
> (e.g. too little memory), will be placed in the "DOWN" state to avoid
> scheduling jobs on t
Even without the slurm-bank system, you can enforce a limit on resources with a
QOS applied to those users. Something like:
=
sacctmgr add qos bank1 flags=NoDecay,DenyOnLimit
sacctmgr modify qos bank1 set grptresmins=cpu=1000
sacctmgr add account bank1
sacctmgr modify account name=bank1 set
That’s close to what we’re doing, but without dedicated nodes. We have three
back-end partitions (interactive, any-interactive, and gpu-interactive), but
the users typically don’t have to consider that, due to our job_submit.lua
plugin.
All three partitions have a default of 2 hours, 1 core, 2
node with oversubscribe should be sufficient.
> If you can't spare a single node then a VM would do the job.
>
> -Paul Edmon-
>
> On 6/11/2020 9:28 AM, Renfro, Michael wrote:
>> That’s close to what we’re doing, but without dedicated nodes. We have three
>> back-
I think that’s correct. From notes I’ve got for how we want to handle our
fairshare in the future:
Setting up a funded account (which can be assigned a fairshare):
sacctmgr add account member1 Description="Member1 Description" FairShare=N
Adding/removing a user to/from the funded accoun
Will probably need more information to find a solution.
To start, do you have separate partitions for GPU and non-GPU jobs? Do you have
nodes without GPUs?
On Jun 13, 2020, at 12:28 AM, navin srivastava wrote:
Hi All,
In our environment we have GPU. so what i found is if the user having high
ds
Navin
On Sat, Jun 13, 2020, 20:37 Renfro, Michael
mailto:ren...@tntech.edu>> wrote:
Will probably need more information to find a solution.
To start, do you have separate partitions for GPU and non-GPU jobs? Do you have
nodes without GPUs?
On Jun 13, 2020, at 12:28 AM, navin srivas
Not trying to argue unnecessarily, but what you describe is not a universal
rule, regardless of QOS.
Our GPU nodes are members of 3 GPU-related partitions, 2 more resource-limited
non-GPU partitions, and one of two larger-memory partitions. It’s set up this
way to minimize idle resources (due t
There’s a --nice flag to sbatch and srun, at least. Documentation indicates it
decreases priority by 100 by default.
And untested, but it may be possible to use a job_submit.lua [1] to adjust nice
values automatically. At least I can see a nice property in [2], which I assume
means it'd be acce
“The SchedulerType configuration parameter specifies the scheduler plugin to
use. Options are sched/backfill, which performs backfill scheduling, and
sched/builtin, which attempts to schedule jobs in a strict priority order
within each partition/queue.”
https://slurm.schedmd.com/sched_config.ht
If the 500 parameters happened to be filenames, you could do adapt like
(appropriated from somewhere else, but I can’t find the reference quickly:
=
#!/bin/bash
# get count of files in this directory
NUMFILES=$(ls -1 *.inp | wc -l)
# subtract 1 as we have to use zero-based indexing (first e
Probably unrelated to slurm entirely, and most likely has to do with
lower-level network diagnostics. I can guarantee that it’s possible to access
Internet resources from a compute node. Notes and things to check:
1. Both ping and http/https are IP protocols, but are very different (ping
isn’t
Untested, but you should be able to use a job_submit.lua file to detect if the
job was started with srun or sbatch:
* Check with (job_desc.script == nil or job_desc.script == '')
* Adjust job_desc.time_limit accordingly
Here, I just gave people a shell function "hpcshell", which automati
I’ve only got 2 GPUs in my nodes, but I’ve always used non-overlapping CPUs= or
COREs= settings. Currently, they’re:
NodeName=gpunode00[1-4] Name=gpu Type=k80 File=/dev/nvidia[0-1] COREs=0-7,9-15
and I’ve got 2 jobs currently running on each node that’s available.
So maybe:
NodeName=c0005
We’ve run a similar setup since I moved to Slurm 3 years ago, with no issues.
Could you share partition definitions from your slurm.conf?
When you see a bunch of jobs pending, which ones have a reason of “Resources”?
Those should be the next ones to run, and ones with a reason of “Priority” are
The PowerShell script I use to provision new users adds them to an Active
Directory group for HPC, ssh-es to the management node to do the sacctmgr
changes, and emails the user. Never had it fail, and I've looped over entire
class sections in PowerShell. Granted, there are some inherent delays d
One pending job in this partition should have a reason of “Resources”. That job
has the highest priority, and if your job below would delay the
highest-priority job’s start, it’ll get pushed back like you see here.
On Aug 31, 2020, at 12:13 PM, Holtgrewe, Manuel
wrote:
Dear all,
I'm seeing s
We set DefMemPerCPU in each partition to approximately the amount of RAM in a
node divided by the number of cores in the node. For heterogeneous partitions,
we use a lower limit, and we always reserve a bit of RAM for the OS, too. So
for a 64 GB node with 28 cores, we default to 2000 M per CPU,
1 - 100 of 171 matches
Mail list logo