Greetings.

I am using Rocks Cluster Distribution 6.1 and Grid Engine 2011.11p1. All
our simulations are submitted to the queue using the following command
format:

qsub -p N SUBMISSION_SCRIPT.sh

N is a negative integer ranging from -1 through -60 (we consider this the
"priority" of a research group).

Until about a week or so ago, everything worked fine. Upon noticing some
simulations waiting in queue for longer than normal periods of time (for
e.g., my own group's priority is -41), I submitted 60 simulations with
priority values -1, -2, -3, ..., -60.

I noticed that simulations with priority up to -26 ran just fine. Those
with -p value -27 and below just stay in 'qw' mode. The usual 'qstat -j
SIM_ID' command does not have information as to why it's not running
(please see below the output for a simulation with priority -27).
Processors/slots are free and available in long.q.

As far as I know and understand Grid Engine documentation, -p values range
from -1024 through 1023 and non operators/admins are restricted to 0
through -1024.

Any help in debugging/identifying the cause of this problem will be greatly
appreciated.

****************************************************************************************
job_number:                 481703
exec_file:                  job_scripts/481703
submission_time:            Mon Aug  6 12:48:07 2018
owner:                      john
uid:                        38025
group:                      jane-users
gid:                        506
sge_o_home:                 /home/john
sge_o_log_name:             john
sge_o_path:
:/bin:/usr/bin:/usr/kerberos/bin:/share/apps/bin:/share/apps/sbin:/usr/X11R6/bin:/usr/java/latest/bin:/sbin:/usr/sbin:/usr/kerberos/sbin:/opt/gridengine/bin/lx26-amd64:/opt/gridengine/bin/linux-x64:/home/john/bin:/opt/ganglia/bin:/opt/rocks/bin:/opt/rocks/sbin
sge_o_shell:                /bin/bash
sge_o_tz:                   America/Detroit
sge_o_workdir:              /misc/research/john/test_runs
sge_o_host:                 login-0-2
account:                    sge
cwd:                        /misc/research/john/test_runs
merge:                      y
hard resource_list:         mem_free=2G
mail_list:                  john@login-0-1.local
notify:                     TRUE
job_name:                   test_p27.sh
priority:                   -27
jobshare:                   0
hard_queue_list:            long.q
shell_list:                 NONE:/bin/bash
env_list:
script_file:                test_p27.sh
scheduling info:            queue instance "long.q@compute-0-48.local"
dropped because it is disabled
                            queue instance "long.q@compute-0-66.local"
dropped because it is disabled
                            queue instance "long.q@compute-0-65.local"
dropped because it is disabled
                            queue instance "long.q@compute-0-20.local"
dropped because it is disabled
                            queue instance "long.q@compute-0-64.local"
dropped because it is disabled
                            queue instance "repair.q@compute-0-36.local"
dropped because it is disabled
                            queue instance "long.q@compute-0-63.local"
dropped because it is full
                            queue instance "long.q@compute-0-50.local"
dropped because it is full
                            ...
                            queue instance "long.q@compute-0-33.local"
dropped because it is full
                            queue instance "long.q@compute-0-31.local"
dropped because it is full
                            queue instance "long.q@compute-0-35.local"
dropped because it is full
                            queue instance "long.q@compute-0-10.local"
dropped because it is full
                            queue instance "long.q@compute-0-43.local"
dropped because it is full
                            queue instance "short.q@compute-0-1.local"
dropped because it is full
                            queue instance "short.q@compute-0-2.local"
dropped because it is full
                            queue instance "short.q@compute-0-3.local"
dropped because it is full
                            queue instance "short.q@compute-0-0.local"
dropped because it is full
                            queue instance "medium.q@compute-0-6.local"
dropped because it is full
                            queue instance "medium.q@compute-0-7.local"
dropped because it is full
                            queue instance "medium.q@compute-0-5.local"
dropped because it is full
                            queue instance "medium.q@compute-0-4.local"
dropped because it is full
****************************************************************************************


Best regards,
Gowtham

--
Gowtham, PhD
Director of Research Computing, IT
Research Associate Professor, ECE
Michigan Technological University

P: (906) 487-4096
F: (906) 487-2787
https://it.mtu.edu
https://hpc.mtu.edu
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to