New thread since I have narrowed down the problem.
Consider the script:
******************************
#!/bin/bash
#SBATCH -p cheap
#SBATCH -n 32
#SBATCH -t 12:00:00
sig_term()
{
echo "function sig_term called. Exiting"
echo 'sig_term' > slask_term
echo $(date) >> slask_term
}
# associate the function "term_handler" with the TERM signal
trap 'sig_term' SIGTERM
sleep 400 &
wait $!
******************************
If I run this from bash (./script-name), and then send a TERM signal (kill),
this signal is immediately recognized. However, if a submit it using sbatch
(sbatch script-name) and then put in a job from a higher priority partition the
TERM signal seems to be sent only AFTER the gracetime.
Partitions defined:
PartitionName=cheap Nodes=ALL Priority=1 PreemptMode=CANCEL GraceTime=10
Default=YES MaxTime=INFINITE State=UP:
PartitionName=paid_jobs Nodes=ALL Priority=1000 PreemptMode=OFF Default=YES
MaxTime=INFINITE State=UP:
See the same behavior on 17.02.7-1.el7 and 15.08.13-1.el7. Can someone confirm
the signals are indeed being sent when the higher priority job is detected, and
not ONLY at the end of the gracetime period? Or otherwise inform me what I'm
doing wrong (which could well be the most probable scenario).
Thanks,
/jon