Hello, I found an unexpected behavior when setting a hard and soft time limits and doing automatic rescheduling by SIGUSR1.
This is my test script: #!/bin/bash #$ -S /bin/bash #$ -l s_rt=0:0:5,h_rt=0:0:10 #$ -j y set -x set -e set -o pipefail set -u trap "exit 99" SIGUSR1 trap "exit 2" SIGTERM echo "hello world" sleep 15 It should reschedule itself indefinitely when s_rt lapses. Yet, what is happening is that rescheduling happens only once. On the second run the job receives only SIGTERM and exits. Here is the script's output: node140 + set -e + set -o pipefail + set -u + trap 'exit 99' SIGUSR1 + trap 'exit 2' SIGTERM + echo 'hello world' hello world + sleep 15 User defined signal 1 ++ exit 99 node069 + set -e + set -o pipefail + set -u + trap 'exit 99' SIGUSR1 + trap 'exit 2' SIGTERM + echo 'hello world' hello world + sleep 15 Terminated ++ exit 2 Execd logs confirms that for the second time the jobs was killed for exceeding h_rt: 06/08/2018 21:20:15| main|node140|W|job 8030395.1 exceeded soft wallclock time - initiate soft notify method 06/08/2018 21:20:59| main|node140|E|shepherd of job 8030395.1 exited with exit status = 25 06/08/2018 21:21:45| main|node069|W|job 8030395.1 exceeded hard wallclock time - initiate terminate method And here is the accounting information: ============================================================== qname short.q hostname node140 group everyone owner ilya project project.p department defaultdepartment jobname reshed_test.sh jobnumber 8030395 taskid undefined account sge priority 0 qsub_time Fri Jun 8 21:19:40 2018 start_time Fri Jun 8 21:20:09 2018 end_time Fri Jun 8 21:20:15 2018 granted_pe NONE slots 1 failed 25 : rescheduling exit_status 99 ru_wallclock 6 ... ============================================================== qname short.q hostname node069 group everyone owner ilya project project.p department defaultdepartment jobname reshed_test.sh jobnumber 8030395 taskid undefined account sge priority 0 qsub_time Fri Jun 8 21:19:40 2018 start_time Fri Jun 8 21:21:39 2018 end_time Fri Jun 8 21:21:50 2018 granted_pe NONE slots 1 failed 0 exit_status 2 ru_wallclock 11 ... Is there anything in the configuration I could be missing. Running 6.2u5. Thank you, Ilya.
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users