On Mon, Sep 22, 2014 at 02:01:31PM +0000, Paul Jewell [Rudraya] wrote:
Hello,

I am trying to fix a situation in which many long running jobs are being ran on the grid and another job which is a 
higher priority needs to be ran immediately. The issue I am running into is the operation of the 
"suspend" function in the gridengine "qmod" command. It seems that by suspending a job with -sj 
<job> the job will change to a "s" stats, but will continue to sit on the  queue for the node 
instead of letting other jobs that are waiting to start begin. Is this the intended behavior? How can I suspend 
jobs to allow other jobs to run in their place, or is it required to delete them completely?

When a job is suspended, it it literally sent a SIGSTOP, and processing
halts.  It remains in the queue, and on the exec host it also remains in
memory (although since the process is suspended, it is more likely to
moved to swap if other processes need physical RAM)

It is important to realize that a suspended job still uses a slot.  In a
simple case of one host => one queue => one slot, suspending a job
doesn't help much.  In your case, you may want to look at configuring
multiple queues on the same exec hosts:  one for normal jobs, and a
restricted/limited access queue for high priority jobs.

This question comes up a lot, and there are number of resources out
there already.  Look for "SGE subordinate queues" to get started.

You might also try looking at page 42 of this presentation:
http://beowulf.rutgers.edu/info-user/pdf/ge_presentation.pdf

--
Jesse Becker (Contractor)
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to