Hi,

Am 29.12.2011 um 19:15 schrieb Semi:

> 
> I have public queue intel_all.q and  private queue namd.q with 
> subordinate_list      intel_all.q=1
> 
> Some nodes of namd.q included in intel_all.q, that have
> suspend_method        /storage/Scripts/job_resubmit.sh $job_id
> 
> cat /storage/Scripts/job_resubmit.sh
> #!/bin/sh
> /storage/SGE/bin/lx24-amd64/qresub $1
> /storage/SGE/bin/lx24-amd64/qdel $1

is it happening only on certain hosts? In this configuration all exechosts also 
need to be submit hosts. The other annoyance I see is, that the resubmitted 
jobs are pushed at the end of the waiting jobs again.
 

> When even 1 job from private queue submitted, public jobs have to be 
> resubmitted and killed.
> Sometimes it doesn't work, they got status S (suspend)
> sge143                  lx24-amd64     24 45.65   47.3G   30.4G   48.0G     
> 0.0
>    namd.q               BIP   24/24    
>    intel_all.q          BIP   23/24    S
> 
> 5219266 0.50511 SemanticEx alexla       S     12/29/2011 16:34:08 
> intel_all.q@sge143                 1

Maybe the `qdel` didn't succeed. You can check the messages files of the 
qmaster and the exechost whether it was issued and executed. If the job isn't 
deleted or stopped by a signal, they will continue as you observe it right now.

I would suggest to remove the suspend_method, and define a checkpointing 
interface, which is attached to intel_all.q and to reach this queue it's then 
sufficient to request the checkpointing interface. When the chechkpointing 
interface is setup to migrate on suspend, the job (with still the same 
jobnumber) will be requeued automatically.

http://comments.gmane.org/gmane.comp.clustering.opengridengine.user/2193

-- Reuti


> and stiil actually running and take resources of the node (CPU & memory).
> How I can solve this problem?
> 
> 
> 
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to