Hi;
I prefer to use epilog script to store the job information to a top
directory owned by the slurm user. To avoid a directory with a lot of
files, It creates a sub-directory for a thousand job file. For a job
which its jobid is 230988, It creates a directory named as 230XXX. Also
the SLURM_JOB_ID of a job array is a problem, because of the slurm uses
an ugly format (298903_[3%1]). Because of these reasons, my script is
little complex, but it works (I crop the other non-relevant things):
#!/bin/bash
if [ "x$SLURM_ARRAY_JOB_ID" != "x" ]
then
JOBNO="${SLURM_ARRAY_JOB_ID}_${SLURM_ARRAY_TASK_ID}"
else
JOBNO="${SLURM_JOB_ID}"
fi
JI=${JOBNO//_*/}
JWIDE=${#JI}
JIDLEN=0
$((JIDLEN=JWIDE-3))
JDIR=/okyanus/SLURM/log/jobs/${JI:0:$JIDLEN}XXX
echo
"==========================================================================="
&>>$JDIR/${JI}.txt
scontrol show job -dd "$JOBNO" &>>$JDIR/${JI}.txt && echo
"==========================================================================="
>>$JDIR/${JI}.txt && scontrol write batch_script "$SLURM_JOBID" -
>>$JDIR/${JI}.txt
exit 0
Regards;
Ahmet M.
23.04.2020 10:33 tarihinde Gestió Servidors yazdı:
Hello,
When a job is “pending” or “running”, with “scontrol show
jobid=#jobjumber” I can get some usefull information, but when the job
has finished, that command doesn’t return anything. For example, if I
run a “sacct” and I see that some jobs have finished with state
“FAILED”, how can I get detailed information from that job?
Thanks.