Hello,

I'm doing some test with "associations" with "sacctmgr". I have created three 
users (user_1, user_2 and user_3). For each of these users, I have created an 
association:

[root@myserver log]# sacctmgr show user user_1 --associations
      User   Def Acct     Admin    Cluster    Account  Partition     Share   
Priority MaxJobs MaxNodes  MaxCPUs MaxSubmit     MaxWall  MaxCPUMins            
      QOS   Def QOS
---------- ---------- --------- ---------- ---------- ---------- --------- 
---------- ------- -------- -------- --------- ----------- ----------- 
-------------------- ---------
    user_1       test      None     q50004       test    aolin.q         1      
            4        2       10                                                 
normal
    user_1       test      None     q50004       test cuda-staf+         1      
            4        2       10                                                 
normal

[root@myserver log]# sacctmgr show user user_2 --associations
      User   Def Acct     Admin    Cluster    Account  Partition     Share   
Priority MaxJobs MaxNodes  MaxCPUs MaxSubmit     MaxWall  MaxCPUMins            
      QOS   Def QOS
---------- ---------- --------- ---------- ---------- ---------- --------- 
---------- ------- -------- -------- --------- ----------- ----------- 
-------------------- ---------
    user_2       test      None     q50004       test cuda-int.q         1      
                              4                                                 
normal

[root@myserver log]# sacctmgr show user user_3 --associations
      User   Def Acct     Admin    Cluster    Account  Partition     Share   
Priority MaxJobs MaxNodes  MaxCPUs MaxSubmit     MaxWall  MaxCPUMins            
      QOS   Def QOS
---------- ---------- --------- ---------- ---------- ---------- --------- 
---------- ------- -------- -------- --------- ----------- ----------- 
-------------------- ---------
    user_3       test      None     q50004       test research.q         1      
                     2        1                                                 
normal
    user_3       test      None     q50004       test     xeon.q         1      
                     2        1                                                 
normal

All users belong to "Test" account:
[root@myserver log]# sacctmgr show account test --association
   Account                Descr                  Org    Cluster ParentName      
 User     Share   Priority GrpJobs GrpNodes  GrpCPUs  GrpMem GrpSubmit     
GrpWall  GrpCPUMins MaxJobs MaxNodes  MaxCPUs MaxSubmit     MaxWall  MaxCPUMins 
                 QOS   Def QOS
---------- -------------------- -------------------- ---------- ---------- 
---------- --------- ---------- ------- -------- -------- ------- --------- 
----------- ----------- ------- -------- -------- --------- ----------- 
----------- -------------------- ---------
      test                 test                 test     q50004       root      
              1                                                                 
                                                                                
         normal
      test                 test                 test     q50004                
user_1         1                                                                
                      4        2       10                                       
          normal
      test                 test                 test     q50004                
user_1         1                                                                
                      4        2       10                                       
          normal
      test                 test                 test     q50004                
user_2         1                                                                
                                        4                                       
          normal
      test                 test                 test     q50004                
user_3         1                                                                
                               2        1                                       
          normal
      test                 test                 test     q50004                
user_3         1                                                                
                               2        1                                       
          normal


When I submit with "user_1", all tests are running fine: some jobs are queued 
and executed and some jobs are rejected because of the limits.
However, with users "user_2" and "user_3" I can't submit any job. All jobs are 
rejected with these messages:
     11168 research.     test          user_3  PENDING         0:00  
2024-04-17T12:53:21                  N/A    1    1     OK                  N/A 
(AssocMaxCpuPerJo (null)
     11173 research.     test          user_3  PENDING         0:00  
2024-04-17T13:06:02                  N/A    1    1     OK                  N/A 
(AssocMaxCpuPerJo (null)
     11174 research.     test          user_3  PENDING         0:00  
2024-04-17T13:06:16                  N/A    1    1     OK                  N/A 
(AssocMaxCpuPerJo (null)
     11176 research.     test          user_3  PENDING         0:00  
2024-04-17T13:07:23                  N/A    1    1     OK                  N/A 
(AssocMaxCpuPerJo (null)
     11180 research.     test          user_3  PENDING         0:00  
2024-04-17T13:08:45                  N/A    1    1     OK                  N/A 
(AssocMaxCpuPerJo (null)

For example, user "user_3" are trying to submit in this way (test.sh script 
only is a simple "sleep 50":
sbatch -p aolin.q -N 2 ./test.sh --> sbatch: error: Batch job submission 
failed: Invalid account or account/partition combination specified
sbatch -p aolin.q -N 1 ./test.sh --> sbatch: error: Batch job submission 
failed: Invalid account or account/partition combination specified
sbatch -p research.q -N 1 ./test.sh --> submitted but not running --> 
nodelist(reason)= (AssocMaxCpuPerJobLimit) -> WHY???
sbatch -p research.q -N 1 -n 1 ./test.sh --> submitted but not running --> 
nodelist(reason)= (AssocMaxCpuPerJobLimit) --> WHY???
sbatch -p xeon.q -N 1 -n 1 ./test.sh --> submitted and running!!

[root@myserver log]# squeue
     JOBID PARTITION     NAME            USER    STATE         TIME          
SUBMIT_TIME           START_TIME NODE CPUS OVER_S        TRES_PER_NODE 
NODELIST(REASON)  DEPENDENCY        REQ_NODES   NODELIST
     11202 research.     test          user_3  PENDING         0:00  
2024-04-17T13:33:31                  N/A    1    1     OK                  N/A 
(AssocMaxCpuPerJo (null)
     11200 research.     test          user_3  PENDING         0:00  
2024-04-17T13:33:17                  N/A    1    1     OK                  N/A 
(AssocMaxCpuPerJo (null)
     11212    xeon.q     test          user_3  RUNNING         0:18  
2024-04-17T13:36:10  2024-04-17T13:36:10    1    1     OK                  N/A 
aolin-cpu-1       (null)             aolin-cpu-1

Why? What am I doing wrong? Where is the limit that I am not seeing?


Thanks a lot!

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to