> Am 28.09.2016 um 17:06 schrieb Dan Hyatt <dhy...@dsgmail.wustl.edu>:
> 
> Thanks,
> 
> after what you said, suggests it is something the user is doing. But she is 
> saying some of the jobs are working and some are being dumped because its 
> full.

Maybe with "full" she refers to the diskspace on the nodes and not any output 
of SGE.

-- Reuti


> On 09/28/2016 09:41 AM, Chris Dagdigian wrote:
>> 
>> I think the "queue instance dropped because ... full" is not related to your 
>> user/job problem. The dropped message is a sign from the job placement 
>> process that the queue instance was skipped during the active host 
>> select-and-job-dispatch round because it had no more job slots free to take 
>> new work. This would be a normal status alert on an active cluster with lots 
>> of jobs in 'qw' state. No big deal basically unless you think a resource, 
>> quota or some other thing is interfering.
>> 
>> State "Eqw" is usually a sign that something went badly wrong with a job. 
>> Its usually a sign of a significant issue like the UID/GID of the user not 
>> existing on the execution host or similar or it could be as simple as user 
>> error in a script (permission denied, path not found, etc.).
>> 
>> What does "qstat -j <jobID>" tell you about the jobs in Eqw state? Any 
>> interesting spool lots from the compute nodes or qmaster?
>> 
>> Chris
>> 
>> 
>> 
>> 
>> Dan Hyatt wrote:
>>> 
>>> I am trying to narrow down what would cause this. I searched google and the 
>>> sge resources and could not find a reason for
>>> 
>>>  queue instance "VeryHighMem@blade5-5-8" dropped because it is full
>>>  queue instance "HighMem@blade5-1-4" dropped because it is full
>>> 
>>> This is that one user almost every shop has who is incredible at its work, 
>>> but causes about 90% of the technical problems because of bad choices.
>>> 
>>> 
>>> Why would sge queue the jobs for everyone else but with this user suddenly 
>>> drop jobs "because its full"
>>> 
>>> I have lots of jobs went to "eqw" as shown in the follow:
>>> 1144122 0.55500 sas64      username       Eqw   09/27/2016 22:54:45         
>>>                            1
>>> 1144125 0.55500 sas64      username       Eqw   09/27/2016 22:55:35         
>>>                            1
>>> 1144127 0.55500 sas64      username       Eqw   09/27/2016 22:56:25         
>>>                            1
>>> 1144130 0.55500 sas64      username       Eqw   09/27/2016 22:57:15         
>>>                            1
>>> 1144134 0.55500 sas64      username       Eqw   09/27/2016 22:58:05         
>>>                            1
>>> 1144139 0.55500 sas64      username       Eqw   09/27/2016 22:58:55         
>>>                            1
>>> 1144142 0.55500 sas64      username       Eqw   09/27/2016 22:59:46         
>>>                            1
>>> 1144145 0.55500 sas64      username       Eqw   09/27/2016 23:00:36         
>>>                            1
>>> 1144151 0.55500 sas64      username       Eqw   09/27/2016 23:01:26         
>>>                            1
>>> 1144156 0.55500 sas64      username       Eqw   09/27/2016 23:02:16         
>>>                            1
>>> 1144161 0.55500 sas64      username       Eqw   09/27/2016 23:03:06         
>>>                            1
>>> 1144165 0.55500 sas64      username       Eqw   09/27/2016 23:03:56         
>>>                            1
>>> 1144169 0.55500 sas64      username       Eqw   09/27/2016 23:04:46         
>>>                            1
>>> 1144174 0.55500 sas64      username       Eqw   09/27/2016 23:05:36         
>>>                            1
>>> 1144177 0.55500 sas64      username       Eqw   09/27/2016 23:06:26         
>>>                            1
>>> 1144182 0.55500 sas64      username       Eqw   09/27/2016 23:07:17         
>>>                            1
>>> 1144186 0.55500 sas64      username       Eqw   09/27/2016 23:08:07         
>>>                            1
>>> 1144196 0.55500 sas64      username       Eqw   09/27/2016 23:08:57         
>>>                            1
>>> 1144204 0.55500 sas64      username       Eqw   09/27/2016 23:09:47         
>>>                            1
>>> 1144212 0.55500 sas64      username       Eqw   09/27/2016 23:10:37         
>>>                            1
>>> 1144217 0.55500 sas64      username       Eqw   09/27/2016 23:11:27         
>>>                            1
>>> 1144221 0.55500 sas64      username       Eqw   09/27/2016 23:12:17         
>>>                            1
>>> 1144224 0.55500 sas64      username       Eqw   09/27/2016 23:13:08         
>>>                            1
>>> 1144225 0.55500 sas64      username       Eqw   09/27/2016 23:13:58         
>>>                            1
>>> 1144227 0.55500 sas64      username       Eqw   09/27/2016 23:14:48         
>>>                            1
>>> 1144232 0.55500 sas64      username       Eqw   09/27/2016 23:15:38         
>>>                            1
>>> 1144236 0.55500 sas64      username       Eqw   09/27/2016 23:16:28         
>>>                            1
>>> 1144244 0.55500 sas64      username       Eqw   09/27/2016 23:17:18         
>>>                            1
>>> 1144255 0.55500 sas64      username       Eqw   09/27/2016 23:18:09         
>>>                            1
>>> 1144265 0.55500 sas64      username       Eqw   09/27/2016 23:18:59         
>>>                            1
>>> 1144276 0.55500 sas64      username       Eqw   09/27/2016 23:19:49         
>>>                            1
>>> 1144286 0.55500 sas64      username       Eqw   09/27/2016 23:20:39         
>>>                            1
>>> 1144295 0.55500 sas64      username       Eqw   09/27/2016 23:21:29         
>>>                            1
>>> 1144306 0.55500 sas64      username       Eqw   09/27/2016 23:22:19         
>>>                            1
>>> 1144316 0.55500 sas64      username       Eqw   09/27/2016 23:23:09         
>>>                            1
>>> 1144326 0.55500 sas64      username       Eqw   09/27/2016 23:23:59         
>>>                            1
>>> 1144335 0.55500 sas64      username       Eqw   09/27/2016 23:24:49         
>>>                            1
>>> 1144344 0.55500 sas64      username       Eqw   09/27/2016 23:25:39         
>>>                            1
>>> 1144351 0.55500 sas64      username       Eqw   09/27/2016 23:26:30         
>>>                            1
>>> 1144359 0.55500 sas64      username       Eqw   09/27/2016 23:27:20         
>>>                            1
>>> 1144366 0.55500 sas64      username       Eqw   09/27/2016 23:28:10         
>>>                            1
>>> 1144374 0.55500 sas64      username       Eqw   09/27/2016 23:29:00         
>>>                            1
>>> 1144416 0.55500 sas64      username       Eqw   09/27/2016 23:29:50         
>>>                            1
>>> 1144482 0.55500 sas64      username       Eqw   09/27/2016 23:30:40         
>>>                            1
>>> 1144484 0.55500 sas64      username       Eqw   09/27/2016 23:31:30         
>>>                            1
>>> 1144485 0.55500 sas64      username       Eqw   09/27/2016 23:32:20         
>>>                            1
>>> 1144486 0.55500 sas64      username       Eqw   09/27/2016 23:33:10         
>>>                            1
>>> 1144487 0.55500 sas64      username       Eqw   09/27/2016 23:34:00         
>>>                            1
>>> 1144491 0.55500 sas64      username       Eqw   09/27/2016 23:34:51         
>>>                            1
>>> 1144498 0.55500 sas64      username       Eqw   09/27/2016 23:35:41         
>>>                            1
>>> 1144499 0.55500 sas64      username       Eqw   09/27/2016 23:36:31         
>>>                            1
>>> 1144500 0.55500 sas64      username       Eqw   09/27/2016 23:37:21         
>>>                            1
>>> _______________________________________________
>>> users mailing list
>>> users@gridengine.org
>>> https://gridengine.org/mailman/listinfo/users
>> 
> 
> _______________________________________________
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to