To Grid Engine Users,

Looking at the man pages for "queue_conf" for both the "prolog" and "epilog" 
shows the following with regards to exit codes:

Exit codes for the prolog/epilog attribute can be interpreted based on the 
following exit values:
              0: Success
              99: Reschedule job
              100: Put job in error state
              Anything else: Put queue in error state

When a job and/or job task from an array job exits with an exit code of 100, we 
see something like the following from the qstat command:

job-ID     prior   name       user         state submit/start at     queue      
                    jclass                         slots ja-task-ID
------------------------------------------------------------------------------------------------------------------------------------------------
      1009 0.54976 epi_ex100  tdhf781      Eqw   09/08/2016 13:11:42            
                                                       1

What I would like to ask is if there is a way to "trap" all other exit code 
values other than 0, 99 and 100 so that jobs or job tasks show up with a job 
state of "Eqw" or some error state?

In the epilog script that I've setup for our jobs, I've attempted to capture 
the value of the "exit_status" of a job or job task and if it isn't 0, 99 or 
100, exit the epilog script with an "exit 100".   However this doesn't appear 
to work.

Anyway way of stating what I'm trying to convey is if the exit status a job or 
job task is anything other than 0, 99 or 100 put the job in error state.      
If this can be done, then we would know that a job didn't complete correctly 
and if it is in Eqw state we have the option of clearing error state (i.e. qmod 
-cj) and re-executing the job again.

Wayne Lee
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to