Hello all, I am not sure if I've stumbled upon a bug (14.03.6) or if this is the intended behavior of an account tied into an association with a DefQOS.
Basically, I've configured our cluster to use associations with a DefQOS other than "normal" for specific accounts (Priority, preemption, etc.). When I add a new user via sacctmgr add user... or via sacctmgr load file=... and that new user submits a job, the default qos I've configured is not respected. If I restart slurmd/slurmctld (service slurm restart) across the cluster and submit a job again, the default qos is respected. Here is the configuration I've set: sacctmgr show association where account=faculty Cluster Account User Partition Share GrpJobs GrpNodes GrpCPUs GrpMem GrpSubmit GrpWall GrpCPUMins MaxJobs MaxNodes MaxCPUs MaxSubmit MaxWall MaxCPUMins QOS Def QOS GrpCPURunMins ---------- ---------- ---------- ---------- --------- ------- -------- -------- ------- --------- ----------- ----------- ------- -------- -------- --------- ----------- ----------- -------------------- --------- ------------- c_slurm faculty 1 512 faculty elevated,faculty,no+ c_slurm faculty davidrogers parent 512 faculty elevated,faculty,no+ c_slurm faculty hlw parent 512 faculty elevated,faculty,no+ c_slurm faculty sathan parent 512 faculty elevated,faculty,no+ Here is the user I've added: sacctmgr add user pshudson defaultaccount=faculty fairshare=parent Adding User(s) pshudson Settings = Default Account = faculty Associations = U = pshudson A = faculty C = c_slur Non Default Settings Fairshare = 2147483647 Would you like to commit changes? (You have 30 seconds to decide) (N/y): y Double-check the new entry: sacctmgr show association where account=faculty Cluster Account User Partition Share GrpJobs GrpNodes GrpCPUs GrpMem GrpSubmit GrpWall GrpCPUMins MaxJobs MaxNodes MaxCPUs MaxSubmit MaxWall MaxCPUMins QOS Def QOS GrpCPURunMins ---------- ---------- ---------- ---------- --------- ------- -------- -------- ------- --------- ----------- ----------- ------- -------- -------- --------- ----------- ----------- -------------------- --------- ------------- c_slurm faculty 1 512 faculty elevated,faculty,no+ c_slurm faculty davidrogers parent 512 faculty elevated,faculty,no+ c_slurm faculty hlw parent 512 faculty elevated,faculty,no+ c_slurm faculty pshudson parent 512 faculty elevated,faculty,no+ c_slurm faculty sathan parent 512 faculty elevated,faculty,no+ Here is the user immediately submitting a job: [pshudson@host ~]$ salloc -N 4 -n 4 -t 01:00:00 salloc: Granted job allocation 24150 Here is the squeue output: JOBID ST PRIO QOS PARTI NAME USER ACCOUNT SUBMIT_TIME START_TIME TIME TIMELIMIT EXEC_HOST CPUS NODES MIN_M NODELIST(REASON) 24150 R 0.00 norma satur bash pshudson faculty 2014-10-15T14:40:46 2014-10-15T14:40:46 0:03 1:00:00 rcslurm 4 4 0 rack-5-[16-19] As you can see, the "QOS" is "normal". I go ahead and restart slurmd/slurmctld across the cluster and resubmit: [pshudson@host ~]$ salloc -N 4 -n 4 -t 01:00:00 salloc: Granted job allocation 24153 Here is the squeue output: JOBID ST PRIO QOS PARTI NAME USER ACCOUNT SUBMIT_TIME START_TIME TIME TIMELIMIT EXEC_HOST CPUS NODES MIN_M NODELIST(REASON) 24153 R 0.00 facul satur bash pshudson faculty 2014-10-15T14:42:23 2014-10-15T14:42:23 0:03 1:00:00 rcslurm 4 4 0 rack-5-[16-19] The default qos is now respected. Is a restart of the slurmd/slurmctld daemons necessary and just undocumented or is this a potential bug? Thank you, John DeSantis