Thanks,
after what you said, suggests it is something the user is doing. But she
is saying some of the jobs are working and some are being dumped because
its full.
On 09/28/2016 09:41 AM, Chris Dagdigian wrote:
I think the "queue instance dropped because ... full" is not related
to your user/job problem. The dropped message is a sign from the job
placement process that the queue instance was skipped during the
active host select-and-job-dispatch round because it had no more job
slots free to take new work. This would be a normal status alert on an
active cluster with lots of jobs in 'qw' state. No big deal basically
unless you think a resource, quota or some other thing is interfering.
State "Eqw" is usually a sign that something went badly wrong with a
job. Its usually a sign of a significant issue like the UID/GID of the
user not existing on the execution host or similar or it could be as
simple as user error in a script (permission denied, path not found,
etc.).
What does "qstat -j <jobID>" tell you about the jobs in Eqw state? Any
interesting spool lots from the compute nodes or qmaster?
Chris
Dan Hyatt wrote:
I am trying to narrow down what would cause this. I searched google
and the sge resources and could not find a reason for
queue instance "VeryHighMem@blade5-5-8" dropped because it is full
queue instance "HighMem@blade5-1-4" dropped because it is full
This is that one user almost every shop has who is incredible at its
work, but causes about 90% of the technical problems because of bad
choices.
Why would sge queue the jobs for everyone else but with this user
suddenly drop jobs "because its full"
I have lots of jobs went to "eqw" as shown in the follow:
1144122 0.55500 sas64 username Eqw 09/27/2016 22:54:45
1
1144125 0.55500 sas64 username Eqw 09/27/2016 22:55:35
1
1144127 0.55500 sas64 username Eqw 09/27/2016 22:56:25
1
1144130 0.55500 sas64 username Eqw 09/27/2016 22:57:15
1
1144134 0.55500 sas64 username Eqw 09/27/2016 22:58:05
1
1144139 0.55500 sas64 username Eqw 09/27/2016 22:58:55
1
1144142 0.55500 sas64 username Eqw 09/27/2016 22:59:46
1
1144145 0.55500 sas64 username Eqw 09/27/2016 23:00:36
1
1144151 0.55500 sas64 username Eqw 09/27/2016 23:01:26
1
1144156 0.55500 sas64 username Eqw 09/27/2016 23:02:16
1
1144161 0.55500 sas64 username Eqw 09/27/2016 23:03:06
1
1144165 0.55500 sas64 username Eqw 09/27/2016 23:03:56
1
1144169 0.55500 sas64 username Eqw 09/27/2016 23:04:46
1
1144174 0.55500 sas64 username Eqw 09/27/2016 23:05:36
1
1144177 0.55500 sas64 username Eqw 09/27/2016 23:06:26
1
1144182 0.55500 sas64 username Eqw 09/27/2016 23:07:17
1
1144186 0.55500 sas64 username Eqw 09/27/2016 23:08:07
1
1144196 0.55500 sas64 username Eqw 09/27/2016 23:08:57
1
1144204 0.55500 sas64 username Eqw 09/27/2016 23:09:47
1
1144212 0.55500 sas64 username Eqw 09/27/2016 23:10:37
1
1144217 0.55500 sas64 username Eqw 09/27/2016 23:11:27
1
1144221 0.55500 sas64 username Eqw 09/27/2016 23:12:17
1
1144224 0.55500 sas64 username Eqw 09/27/2016 23:13:08
1
1144225 0.55500 sas64 username Eqw 09/27/2016 23:13:58
1
1144227 0.55500 sas64 username Eqw 09/27/2016 23:14:48
1
1144232 0.55500 sas64 username Eqw 09/27/2016 23:15:38
1
1144236 0.55500 sas64 username Eqw 09/27/2016 23:16:28
1
1144244 0.55500 sas64 username Eqw 09/27/2016 23:17:18
1
1144255 0.55500 sas64 username Eqw 09/27/2016 23:18:09
1
1144265 0.55500 sas64 username Eqw 09/27/2016 23:18:59
1
1144276 0.55500 sas64 username Eqw 09/27/2016 23:19:49
1
1144286 0.55500 sas64 username Eqw 09/27/2016 23:20:39
1
1144295 0.55500 sas64 username Eqw 09/27/2016 23:21:29
1
1144306 0.55500 sas64 username Eqw 09/27/2016 23:22:19
1
1144316 0.55500 sas64 username Eqw 09/27/2016 23:23:09
1
1144326 0.55500 sas64 username Eqw 09/27/2016 23:23:59
1
1144335 0.55500 sas64 username Eqw 09/27/2016 23:24:49
1
1144344 0.55500 sas64 username Eqw 09/27/2016 23:25:39
1
1144351 0.55500 sas64 username Eqw 09/27/2016 23:26:30
1
1144359 0.55500 sas64 username Eqw 09/27/2016 23:27:20
1
1144366 0.55500 sas64 username Eqw 09/27/2016 23:28:10
1
1144374 0.55500 sas64 username Eqw 09/27/2016 23:29:00
1
1144416 0.55500 sas64 username Eqw 09/27/2016 23:29:50
1
1144482 0.55500 sas64 username Eqw 09/27/2016 23:30:40
1
1144484 0.55500 sas64 username Eqw 09/27/2016 23:31:30
1
1144485 0.55500 sas64 username Eqw 09/27/2016 23:32:20
1
1144486 0.55500 sas64 username Eqw 09/27/2016 23:33:10
1
1144487 0.55500 sas64 username Eqw 09/27/2016 23:34:00
1
1144491 0.55500 sas64 username Eqw 09/27/2016 23:34:51
1
1144498 0.55500 sas64 username Eqw 09/27/2016 23:35:41
1
1144499 0.55500 sas64 username Eqw 09/27/2016 23:36:31
1
1144500 0.55500 sas64 username Eqw 09/27/2016 23:37:21
1
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users