Greetings. I am using Rocks Cluster Distribution 6.1 and Grid Engine 2011.11p1. All our simulations are submitted to the queue using the following command format:
qsub -p N SUBMISSION_SCRIPT.sh N is a negative integer ranging from -1 through -60 (we consider this the "priority" of a research group). Until about a week or so ago, everything worked fine. Upon noticing some simulations waiting in queue for longer than normal periods of time (for e.g., my own group's priority is -41), I submitted 60 simulations with priority values -1, -2, -3, ..., -60. I noticed that simulations with priority up to -26 ran just fine. Those with -p value -27 and below just stay in 'qw' mode. The usual 'qstat -j SIM_ID' command does not have information as to why it's not running (please see below the output for a simulation with priority -27). Processors/slots are free and available in long.q. As far as I know and understand Grid Engine documentation, -p values range from -1024 through 1023 and non operators/admins are restricted to 0 through -1024. Any help in debugging/identifying the cause of this problem will be greatly appreciated. **************************************************************************************** job_number: 481703 exec_file: job_scripts/481703 submission_time: Mon Aug 6 12:48:07 2018 owner: john uid: 38025 group: jane-users gid: 506 sge_o_home: /home/john sge_o_log_name: john sge_o_path: :/bin:/usr/bin:/usr/kerberos/bin:/share/apps/bin:/share/apps/sbin:/usr/X11R6/bin:/usr/java/latest/bin:/sbin:/usr/sbin:/usr/kerberos/sbin:/opt/gridengine/bin/lx26-amd64:/opt/gridengine/bin/linux-x64:/home/john/bin:/opt/ganglia/bin:/opt/rocks/bin:/opt/rocks/sbin sge_o_shell: /bin/bash sge_o_tz: America/Detroit sge_o_workdir: /misc/research/john/test_runs sge_o_host: login-0-2 account: sge cwd: /misc/research/john/test_runs merge: y hard resource_list: mem_free=2G mail_list: john@login-0-1.local notify: TRUE job_name: test_p27.sh priority: -27 jobshare: 0 hard_queue_list: long.q shell_list: NONE:/bin/bash env_list: script_file: test_p27.sh scheduling info: queue instance "long.q@compute-0-48.local" dropped because it is disabled queue instance "long.q@compute-0-66.local" dropped because it is disabled queue instance "long.q@compute-0-65.local" dropped because it is disabled queue instance "long.q@compute-0-20.local" dropped because it is disabled queue instance "long.q@compute-0-64.local" dropped because it is disabled queue instance "repair.q@compute-0-36.local" dropped because it is disabled queue instance "long.q@compute-0-63.local" dropped because it is full queue instance "long.q@compute-0-50.local" dropped because it is full ... queue instance "long.q@compute-0-33.local" dropped because it is full queue instance "long.q@compute-0-31.local" dropped because it is full queue instance "long.q@compute-0-35.local" dropped because it is full queue instance "long.q@compute-0-10.local" dropped because it is full queue instance "long.q@compute-0-43.local" dropped because it is full queue instance "short.q@compute-0-1.local" dropped because it is full queue instance "short.q@compute-0-2.local" dropped because it is full queue instance "short.q@compute-0-3.local" dropped because it is full queue instance "short.q@compute-0-0.local" dropped because it is full queue instance "medium.q@compute-0-6.local" dropped because it is full queue instance "medium.q@compute-0-7.local" dropped because it is full queue instance "medium.q@compute-0-5.local" dropped because it is full queue instance "medium.q@compute-0-4.local" dropped because it is full **************************************************************************************** Best regards, Gowtham -- Gowtham, PhD Director of Research Computing, IT Research Associate Professor, ECE Michigan Technological University P: (906) 487-4096 F: (906) 487-2787 https://it.mtu.edu https://hpc.mtu.edu
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users