useful complaint in one
of those, whatever the cause.
--
Paul Brunk, system administrator
Advanced Computing Resource Center
Enterprise IT Svcs, the University of Georgia
On 8/29/23, 11:29 AM, "slurm-users"
wrote:
You don't often get email from
nicolas.son...@versatushpc.com.br<
related libraries.
These are a fantastic resource!
--
Paul Brunk, system administrator
Advanced Computing Resource Center
Enterprise IT Svcs, the University of Georgia
On 3/27/23, 2:29 PM, "slurm-users"
wrote:
You don't often get email from cfre...@super.org<mailto:cfre...@supe
starting a Slurm cluster" walkthrough
threads online lately, but haven't seen this particular thing
addressed. I'm aware it might be a non-issue.
--
Paul Brunk, system administrator
Advanced Computing Resource Center
Enterprise IT Svcs, the University of Georgia
ued unless explicitly enabled by the user. Use the
sbatch --no-requeue or --requeue option to change the default
behavior for individual jobs. The default value is 1.
--
Paul Brunk, system administrator
Advanced Computing Resource Center
Enterprise IT Svcs, the University of Georgia
On 8/18/22, 1:57
Hi:
Thanks for your feedback guys :).
We continue to find srun behaving properly re: core placement.
BTW, we've further established that only MVAPICH (and therefore also Intel MPI)
jobs are encountering the OOM issue.
==
Paul Brunk, system administrator
Georgia Advanced Resource Comp
uld make
bigger_qos and smaller_qos, and define those as 'QOS' in the matching
PartitionName entries. Then add whatever ACL or limits you want to
those QOSes. Or use the PartitionName entries if the available
options suffice.
--
Paul Brunk, system administrator
Georgia Advanced Resource Comp
braries: intel/2019b
Observations:
- Works correctly when using: 1 node x 64 cores 64 MPI processes), 1x128 (128
MPI processes) (other
QE parameters -nk 1 -nt 4 , mem-per-cpu=1500mb)
- A few processes get OOM killed after a while when using: 4 nodes x 32
cores (128 MPI processes), 4 nodes x
m to slurmds at dispatch time, who store them
on each node in the slurm.conf 'SlurmdSpoolDir', as Steffen noted.
All this to say that the slurmctld host doesn't need to see the users' home
dirs and/or job script dirs.
==
Paul Brunk, system administrator
Georgia Advanced Res
nd no submissions
would be rejected based on filesystem availability (since the license stuff
can't affect job submission, only dispatch).
I'm sure there could be other solutions. I've not thought further on this
since I've been happily using NHC for a long time.
==
Paul Brunk
f of them in lua filter, or adding
them there) might help too.
--
Paul Brunk, system administrator
Georgia Advanced Resource Computing Center
Enterprise IT Svcs, the University of Georgia
On 2/1/22, 5:45 AM, "slurm-users" wrote:
[EXTERNAL SENDER - PROCEED CAUTIOUSLY]
Hi,
I a
Hi:
You can use e.g. 'sacctmgr show -s users', and you'll see each user's
cluster assocation as one of the output columns. If the name were
'yourcluster', then you could do: sacctmgr modify cluster
name=yourcluster set grpTres="node=8".
==
Paul Brun
an't infer from the log file names (date
stamps) which completed job log a given day's jobs will appear in.
==
Paul Brunk, system administrator
Georgia Advanced Resource Computing Center
Enterprise IT Svcs, the University of Georgia
On 2/8/22, 9:44 AM, "slurm-users" wrote:
Hi:
Normally, adding a new node requires altering slurm.conf, and restarting
slurmctld, and slurmd on each node.
Restarting these daemons should not harm jobs and can be done while existing
jobs are running.
Wishing that I’d just listened this time,
Paul Brunk, system administrator
ment on space consumption. Good
luck!
--
Wishing that I'd just listened this time,
Paul Brunk, system administrator
Georgia Advanced Computing Resource Center
UGA EITS (formerly UCNS)
arted everywhere). Could
this be what you're seeing (as opposed to /etc/hosts vs DNS)?
--
Wishing that I'd just listened this time,
Paul Brunk, system administrator, Workstation Support Group
GACRC (formerly RCC)
UGA EITS (formerly UCNS)
-Original Message-
From: slurm-users On
Hi:
If you mean "why are the nodes still Drained, now that I fixed the
slurm.conf and restarted (never mind whether the RealMem parameter is
correct)?", try 'scontrol update nodename=str957-bl0-0[1-2] State=RESUME'.
--
Paul Brunk, system administrator
Georgia Advanced Compu
Hello Byron:
I’m guessing that your job is asking for more HW than the highmem_p
has in it, or more cores or RAM within a node than any of the nodes
have, or something like that. 'scontrol show job 10860160' might
help. You can also look in slurmctld.log for that jobid.
--
Paul Bru
being inaccurate, and don't yet have e.g. a
MaxTresPerX with some RAM value). With our 'cgroup' ProcTrackType, and
requiring a mem spec on all jobs, I think we don't need worry if a given slurmd
is sending slurmctld wrong or incomprehensible information about a given
Hi:
It's that time again...we're doing travel budget planning. Do we have
a sense of whether or how there will be a user group meeting this
year? I saw the April poll.
Thanks!
--
Grinning like an idiot,
Paul Brunk, system administrator
Georgia Advanced Computing Resource Cen
it lua to add a
request for a license of the relevant type to each submission?
--
Flailing wildly at the keyboard,
Paul Brunk, system administrator
Georgia Advanced Computing Resource Center
Enterprise IT Svcs, the University of Georgia
From: slurm-users On Behalf Of Prentice
Bisbal
Sent: Thurs
Hi:
I've not tried to do that. But the below discussion might help:
https://bugs.schedmd.com/show_bug.cgi?id=2626
From: slurm-users On Behalf Of Ahmad
Khalifa
Sent: Thursday, June 3, 2021 01:12
To: slurm-users@lists.schedmd.com
Subject: [slurm-users] Specify a gpu ID
[EXTERNAL SENDER - PROC
ature on a single node,
> where it looks like that node isn't using RPMs with NVML support.
Indeed, this was a PEBCAK problem--I was not heeding the classic "read
the right fine version of the fine manual" (RTRFVOTFM?) advice.
Thanks all for your replies.
--
Jesting grimly,
ng/reading
/var/lib/slurmd/conf-cache/gres.conf
Reverting to the original, one-line gres.conf reverted the cluster to
production state.
--
Paul Brunk, system administrator
Georgia Advanced Computing Resource Center
Enterprise IT Svcs, the University of Georgia
and also the management of the
reservation's node membership. I don't assume that a good answer resembles
that at all.
Thanks for any insights!
--
Paul Brunk, system administrator
Georgia Advanced Computing Resource Center (GACRC)
Enterprise IT Svcs, the University of Georgia
24 matches
Mail list logo