> Am 28.09.2016 um 17:06 schrieb Dan Hyatt <dhy...@dsgmail.wustl.edu>: > > Thanks, > > after what you said, suggests it is something the user is doing. But she is > saying some of the jobs are working and some are being dumped because its > full.
Maybe with "full" she refers to the diskspace on the nodes and not any output of SGE. -- Reuti > On 09/28/2016 09:41 AM, Chris Dagdigian wrote: >> >> I think the "queue instance dropped because ... full" is not related to your >> user/job problem. The dropped message is a sign from the job placement >> process that the queue instance was skipped during the active host >> select-and-job-dispatch round because it had no more job slots free to take >> new work. This would be a normal status alert on an active cluster with lots >> of jobs in 'qw' state. No big deal basically unless you think a resource, >> quota or some other thing is interfering. >> >> State "Eqw" is usually a sign that something went badly wrong with a job. >> Its usually a sign of a significant issue like the UID/GID of the user not >> existing on the execution host or similar or it could be as simple as user >> error in a script (permission denied, path not found, etc.). >> >> What does "qstat -j <jobID>" tell you about the jobs in Eqw state? Any >> interesting spool lots from the compute nodes or qmaster? >> >> Chris >> >> >> >> >> Dan Hyatt wrote: >>> >>> I am trying to narrow down what would cause this. I searched google and the >>> sge resources and could not find a reason for >>> >>> queue instance "VeryHighMem@blade5-5-8" dropped because it is full >>> queue instance "HighMem@blade5-1-4" dropped because it is full >>> >>> This is that one user almost every shop has who is incredible at its work, >>> but causes about 90% of the technical problems because of bad choices. >>> >>> >>> Why would sge queue the jobs for everyone else but with this user suddenly >>> drop jobs "because its full" >>> >>> I have lots of jobs went to "eqw" as shown in the follow: >>> 1144122 0.55500 sas64 username Eqw 09/27/2016 22:54:45 >>> 1 >>> 1144125 0.55500 sas64 username Eqw 09/27/2016 22:55:35 >>> 1 >>> 1144127 0.55500 sas64 username Eqw 09/27/2016 22:56:25 >>> 1 >>> 1144130 0.55500 sas64 username Eqw 09/27/2016 22:57:15 >>> 1 >>> 1144134 0.55500 sas64 username Eqw 09/27/2016 22:58:05 >>> 1 >>> 1144139 0.55500 sas64 username Eqw 09/27/2016 22:58:55 >>> 1 >>> 1144142 0.55500 sas64 username Eqw 09/27/2016 22:59:46 >>> 1 >>> 1144145 0.55500 sas64 username Eqw 09/27/2016 23:00:36 >>> 1 >>> 1144151 0.55500 sas64 username Eqw 09/27/2016 23:01:26 >>> 1 >>> 1144156 0.55500 sas64 username Eqw 09/27/2016 23:02:16 >>> 1 >>> 1144161 0.55500 sas64 username Eqw 09/27/2016 23:03:06 >>> 1 >>> 1144165 0.55500 sas64 username Eqw 09/27/2016 23:03:56 >>> 1 >>> 1144169 0.55500 sas64 username Eqw 09/27/2016 23:04:46 >>> 1 >>> 1144174 0.55500 sas64 username Eqw 09/27/2016 23:05:36 >>> 1 >>> 1144177 0.55500 sas64 username Eqw 09/27/2016 23:06:26 >>> 1 >>> 1144182 0.55500 sas64 username Eqw 09/27/2016 23:07:17 >>> 1 >>> 1144186 0.55500 sas64 username Eqw 09/27/2016 23:08:07 >>> 1 >>> 1144196 0.55500 sas64 username Eqw 09/27/2016 23:08:57 >>> 1 >>> 1144204 0.55500 sas64 username Eqw 09/27/2016 23:09:47 >>> 1 >>> 1144212 0.55500 sas64 username Eqw 09/27/2016 23:10:37 >>> 1 >>> 1144217 0.55500 sas64 username Eqw 09/27/2016 23:11:27 >>> 1 >>> 1144221 0.55500 sas64 username Eqw 09/27/2016 23:12:17 >>> 1 >>> 1144224 0.55500 sas64 username Eqw 09/27/2016 23:13:08 >>> 1 >>> 1144225 0.55500 sas64 username Eqw 09/27/2016 23:13:58 >>> 1 >>> 1144227 0.55500 sas64 username Eqw 09/27/2016 23:14:48 >>> 1 >>> 1144232 0.55500 sas64 username Eqw 09/27/2016 23:15:38 >>> 1 >>> 1144236 0.55500 sas64 username Eqw 09/27/2016 23:16:28 >>> 1 >>> 1144244 0.55500 sas64 username Eqw 09/27/2016 23:17:18 >>> 1 >>> 1144255 0.55500 sas64 username Eqw 09/27/2016 23:18:09 >>> 1 >>> 1144265 0.55500 sas64 username Eqw 09/27/2016 23:18:59 >>> 1 >>> 1144276 0.55500 sas64 username Eqw 09/27/2016 23:19:49 >>> 1 >>> 1144286 0.55500 sas64 username Eqw 09/27/2016 23:20:39 >>> 1 >>> 1144295 0.55500 sas64 username Eqw 09/27/2016 23:21:29 >>> 1 >>> 1144306 0.55500 sas64 username Eqw 09/27/2016 23:22:19 >>> 1 >>> 1144316 0.55500 sas64 username Eqw 09/27/2016 23:23:09 >>> 1 >>> 1144326 0.55500 sas64 username Eqw 09/27/2016 23:23:59 >>> 1 >>> 1144335 0.55500 sas64 username Eqw 09/27/2016 23:24:49 >>> 1 >>> 1144344 0.55500 sas64 username Eqw 09/27/2016 23:25:39 >>> 1 >>> 1144351 0.55500 sas64 username Eqw 09/27/2016 23:26:30 >>> 1 >>> 1144359 0.55500 sas64 username Eqw 09/27/2016 23:27:20 >>> 1 >>> 1144366 0.55500 sas64 username Eqw 09/27/2016 23:28:10 >>> 1 >>> 1144374 0.55500 sas64 username Eqw 09/27/2016 23:29:00 >>> 1 >>> 1144416 0.55500 sas64 username Eqw 09/27/2016 23:29:50 >>> 1 >>> 1144482 0.55500 sas64 username Eqw 09/27/2016 23:30:40 >>> 1 >>> 1144484 0.55500 sas64 username Eqw 09/27/2016 23:31:30 >>> 1 >>> 1144485 0.55500 sas64 username Eqw 09/27/2016 23:32:20 >>> 1 >>> 1144486 0.55500 sas64 username Eqw 09/27/2016 23:33:10 >>> 1 >>> 1144487 0.55500 sas64 username Eqw 09/27/2016 23:34:00 >>> 1 >>> 1144491 0.55500 sas64 username Eqw 09/27/2016 23:34:51 >>> 1 >>> 1144498 0.55500 sas64 username Eqw 09/27/2016 23:35:41 >>> 1 >>> 1144499 0.55500 sas64 username Eqw 09/27/2016 23:36:31 >>> 1 >>> 1144500 0.55500 sas64 username Eqw 09/27/2016 23:37:21 >>> 1 >>> _______________________________________________ >>> users mailing list >>> users@gridengine.org >>> https://gridengine.org/mailman/listinfo/users >> > > _______________________________________________ > users mailing list > users@gridengine.org > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users