Hi,

> Am 22.12.2016 um 01:47 schrieb Jin Li <lijin....@gmail.com>:
> 
> Hi all,
> 
> Could you help show me how to get the exit code of the running command
> submitted to SGE?
> 
> For example, I have two jobs A and B submitted to SGE, and B depends
> on A. The job B wants to execute only if commands in the job A exit
> successfully. Then how could I get the exit code of the running
> commands in the job A? Thanks for your help.

The -hold_jid option will start job B as soon as job A left the cluster. 
Whether it was completed successfully or not is not taken into account. To 
implement a workflow which honors the result of A you have several options.

First of all, it's desirable to get the exit code of the application also as 
the result of the complete job, independent of any post processing as you then 
have the chance to check it also latter when inspecting the accounting of the 
job:

#!/bin/sh
mybinary < input > output
joberror=$?
any_post_processing_you_have_to_perform
exit $joberror

Then you can check the exit code also with `qacct -j< job_id>`. While it would 
be possible to do it also in job B, there might be a race condition as the 
accounting file is written after the job when the job A left the cluster 
already and hence may not yet exist when job B starts. Whether you use `qacct 
-j <job_id>` or scan the accounting file (which is a plain text file) manually, 
it has to loop to check for the existence of an entry for the job in question.

In case the exechosts could also be submission hosts, it would be possible that 
the job A changes settings of job B. Either by removing a hold of this job (in 
addition to the -hold_jid) or in case it is sensible that the job B starts 
anyway to set a job context or environment variable for job B to indicate which 
processing it should perform.

As long as you can address the job by names (which can for many commands in SGE 
used instead of the job number) it might be easy to know already in job A the 
name of job B.

Otherwise it's possible to attach a job context to the job A also at run time 
(opposite to environment variables). I.e. when you submit job B and you know 
the job number of this job now, it can be attached to job A by `qalter -ac 
NEXT_JOB_ID=123456 6437358` while the latter number is the known job number of 
job A the first one the job id of job B. To script this you may also look into 
the option "-terse". The NEXT_JOB_ID (or any other name you prefer) can be 
checked in job A by:

qstat -j $JOB_ID | sed -n -e "/^context/s/^context: *//p" | tr "," "\n"

just before it exits to take proper action. Please let me know, in case you 
need further details.

-- Reuti
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to