Am 21.09.2012 um 16:13 schrieb Julien Nicoulaud:

> I tried to implement the -notify + trap USR2 solution, but could not get it 
> to work. I can trap the USR2 signal in the qmaster PE script, but as soon as 
> it is sent, the slave tasks get killed, leaving my application no time to 
> cleanly shut them down. The qmaster log displays:

Is this a new question? Originally you wanted to get a proper exit code for 
-sync y, now to gracefully shut down.

-- Reuti


> tightly integrated parallel task 61969.1 task 1.computeXX failed - killing job
> 
> The queue is configured with "notify 00:00:60", so that should leave at least 
> one minute. I also tried to trap USR2 in the PE script and not forward it all 
> to child processes, but slave tasks still get killed. Is there something else 
> specific to do to avoid this?
> 
> 2012/9/19 Julien Nicoulaud <[email protected]>
> Yes, that's what I meant. For me, if control_slaves is FALSE, qsub returns 
> with a non-zero exit code after h_rt is elapsed.
> 
> 
> 2012/9/19 Reuti <[email protected]>
> Hi,
> 
> Am 19.09.2012 um 14:36 schrieb Julien Nicoulaud:
> 
> > On SGE 6.2u5, I submit jobs with -sync y and h_rt. When the jobs gets 
> > killed after the time is elapsed, qsub prints a "Unable to run job" message 
> > but exists with code 0.  I tried to trap KILL signal
> > inside the job script, but it does not seem to affect qsub return code. Is 
> > it possible to make it return 1 ?
> >
> > Note: it only behaves this way for jobs running in a tightly integrated 
> > parallel environment. In a loosely integrated PE, qsub returns 1 in this 
> > case...
> 
> You mean the setting of "control_slaves"? For me it's always 0 if I request a 
> PE.
> 
> -- Reuti
> 
> 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to