On 15/7/19 1:40 am, Kevin Buckley wrote:
Does that mean that having this setting in slurm.conf
PriorityFlags=FAIR_TREE
is now redundant, because it's the default ?
Yes, I would believe so. Of course if you wanted to be explicit about
your choice you could just leave it set to that.
All t
Hi,
I am running slurm version 19.05.0 and openmpi version 3.1.4. Openmpi is
configured with pmi2 from slurm. Whenever I tried to run an mpi job with more
than 1 node, I have this error message:
srun: error: mpi/pmi2: failed to send temp kvs to compute nodes
srun: Job step aborted: Waiting
I found the problem. It was between the chair and keyboard:
$ salloc -p general -q qos -t 00:30:00
When I type the qos right, it works:
$ salloc -p general -q debug -t 00:30:00 -A unix
salloc: Granted job allocation 529343
$ scontrol show job 529343 | grep QOS
Priority=13736 Nice=0 Accou
* Andy Georges [190715 16:17]:
>
> On Fri, Jul 12, 2019 at 03:21:31PM +0200, Juergen Salk wrote:
> > Dear all,
> >
> > I have configured pam_slurm_adopt in our Slurm test environment by
> > following the corresponding documentation:
> >
> > https://slurm.schedmd.com/pam_slurm_adopt.html
> >
> >
That explanation makes perfect sense, but after adding debug to my list
of QOSes in my associations, I still get the same error:
$ sacctmgr show user pbisbal withassoc -p
User|Def
Acct|Admin|Cluster|Account|Partition|Share|MaxJobs|MaxNodes|MaxCPUs|MaxSubmit|MaxWall|MaxCPUMins|QOS|Def
QOS|
pbi
$ scontrol show part general
PartitionName=general
AllowGroups=ALL AllowAccounts=ALL AllowQos=general,debug
AllocNodes=ALL Default=YES QoS=general
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=300
Hidden=NO
MaxNodes=UNLIMITED MaxTime=2-00:00:00 MinNodes=0 LLN=NO
MaxC
I ran into this recently. You need to make sure your user account has
access to that QoS through sacctmgr. Right now I'd say if you did sacctmgr
show user withassoc that the QoS you're attempting to use is NOT
listed as part of the association.
On Mon, Jul 15, 2019 at 2:53 PM Prentice Bisbal wro
On 7/15/19 11:22 AM, Prentice Bisbal wrote:
$ salloc -p general -q debug -t 00:30:00
salloc: error: Job submit/allocate failed: Invalid qos specification
what does:
scontrol show part general
say?
Also, does the user you're testing as have access to that QOS?
All the best,
Chris
--
Chri
I should add that I still get this error even when I remove the
"AllowQOS" attribute from the partition definition altogether:
$ salloc -p general -q debug -t 00:30:00
salloc: error: Job submit/allocate failed: Invalid qos specification
Prentice
On 7/15/19 2:22 PM, Prentice Bisbal wrote:
Slu
Slurm users,
I have created a partition named general should allow the QOSes
'general' and 'debug':
PartitionName=general Default=YES AllowQOS=general,debug Nodes=.
However, when I try to request that QOS, I get an error:
$ salloc -p general -q debug -t 00:30:00
salloc: error: Job submi
Could it be a RHEL7 specific issue?
no - centos7 systems here, and pam_adopt works.
[hahn@gra799 ~]$ cat /proc/self/cgroup
11:memory:/slurm/uid_3000566/job_17268219/step_extern
10:net_prio,net_cls:/
9:pids:/
8:perf_event:/
7:hugetlb:/
6:blkio:/
5:freezer:/slurm/uid_3000566/job_17268219/step_e
On 7/12/19 6:21 AM, Juergen Salk wrote:
I suppose this is nevertheless the expected behavior and just the way
it is when using pam_slurm_adopt to restrict access to the compute
nodes? Is that right? Or did I miss something obvious?
Could it be a RHEL7 specific issue?
It looks like it's workin
Hi Juergen,
On Fri, Jul 12, 2019 at 03:21:31PM +0200, Juergen Salk wrote:
> Dear all,
>
> I have configured pam_slurm_adopt in our Slurm test environment by
> following the corresponding documentation:
>
> https://slurm.schedmd.com/pam_slurm_adopt.html
>
> I've set `PrologFlags=contain´ in slurm.
In the RELEASE_NOTES for 19.05.1-2, we read
HIGHLIGHTS
==
...
-- Changed the default fair share algorithm to "fair tree". To disable this
and revert to "classic" fair share you must set PriorityFlags=NO_FAIR_TREE.
...
Does that mean that having this setting in slurm.conf
Prior
Getting an overview of available Slurm partitions and their current job
load is a non-trivial task.
The great "spart" tool described as "A user-oriented partition info
command for slurm" (https://github.com/mercanca/spart) written by Ahmet
Mercan solves this problem. The "spart" tool is writt
15 matches
Mail list logo