Am 28.09.2012 um 05:14 schrieb Vamsi Krishna: > Exit status 140 - some where i read on internet, excuse if it is wrong, May > i get more details about this exit status and why this is killed with signal > 12. Actually nothing is in /default/spool/`hostname`/messages. i found the > messages only in qmaster/messages. > > i found only one message in default/spool/`hostname`/messages
Maybe the log level needs to be adjusted: $ qconf -sconf ... loglevel log_info -- Reuti > starting up SGE 6.2u5 (lx24-amd64) > > Regards > PVK > > On Thu, Sep 27, 2012 at 11:50 PM, Reuti <[email protected]> wrote: > Am 27.09.2012 um 19:41 schrieb Vamsi Krishna: > >> those were inputs for debugging. >> >> job 1058200.1 failed on host assumedly after job because: job 1058200.1 >> died through signal USR2 (12) >> >> 09/26/2012 17:47:02|worker|E|denied: job "1058200" does not exist >> >> >> >> 50 out of 80 batch jobs got killed in the similar way and also one of the >> job in queue was also killed., does qmaster needs reboot. >> >> >> >> On Thu, Sep 27, 2012 at 9:39 PM, Reuti <[email protected]> wrote: >> Am 26.09.2012 um 13:48 schrieb Vamsi Krishna: >> >>> Exit code 140: The job exceeded the "wall clock" time limit, h_rt is setto >>> infinity > > Who stated that exit code 140 is "wall clock" exceeded and nothing else? Did > you verify it in the messages file of the shepherd on the node's spooling > directory? > > -- Reuti > > >>> submit with -notify by default. >> >> Is this a statement or a question? There can be more reasons for SIGUSR2 >> like a passed memory limit as a result of -notify, or it can only be warned >> as someone killed the job with a `qdel`. >> >> How can it run into h_rt when it's set to infinity? >> >> -- Reuti >> >> >> >>> --PVK >>> >>> On Wed, Sep 26, 2012 at 12:46 PM, Reuti <[email protected]> wrote: >>> Am 26.09.2012 um 08:53 schrieb Vamsi Krishna: >>> >>> > some of the batch jobs are killed and qacct -j of the job id >>> > >>> > failed 100 : assumedly after job >>> > exit_status 140 >>> >>> It's 128 + 12 = SIGUSR2. So what can cause this signal to be generated? >>> >>> Something in your job? >>> >>> You submit with -notify? >>> >>> -- Reuti >>> >>> >>> > >>> > >>> > what could be the reason. >>> > >>> > Regards >>> > PVK >>> > >>> > _______________________________________________ >>> > users mailing list >>> > [email protected] >>> > https://gridengine.org/mailman/listinfo/users >>> >>> >> >> > > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
