Hi, everyone -- Our take on using epilog is likely familiar to many, but perhaps not all. Here is an extract from epilog: /usr/local/slurm/epilogctld:/usr/bin/scontrol show job=$SLURM_JOB_ID --oneliner >> /usr/local/slurm/slurmrecord/$((SLURM_JOB_ID/10000)).record
The file size may be adjusted. Then these 'record' files may be accessed/analyzed through any text extraction tool of choice. We have a corresponding archive of job submission scripts, where we've found it more useful to preserve each script in the user-specified format. Note the absence of '--oneliner': /usr/local/slurm/epilogctld:cat /usr/local/slurm/slurmrecord/tmp-$SLURM_JOB_ID >> /usr/local/slurm/slurmrecord/$((SLURM_JOB_ID/10000)).script Cheers ~ E.M. On Thu, Apr 23, 2020 at 5:46 AM mercan <ahmet.mer...@uhem.itu.edu.tr> wrote: > Sorry, I falsely crop the "mkdir" line at below: > > mkdir -p $JDIR > > I should be after "JDIR=/okyanus/..." line > > Regards; > > Ahmet M. > > > 23.04.2020 12:31 tarihinde mercan yazdı: > > Hi; > > > > I prefer to use epilog script to store the job information to a top > > directory owned by the slurm user. To avoid a directory with a lot of > > files, It creates a sub-directory for a thousand job file. For a job > > which its jobid is 230988, It creates a directory named as 230XXX. > > Also the SLURM_JOB_ID of a job array is a problem, because of the > > slurm uses an ugly format (298903_[3%1]). Because of these reasons, my > > script is little complex, but it works (I crop the other non-relevant > > things): > > > > #!/bin/bash > > > > if [ "x$SLURM_ARRAY_JOB_ID" != "x" ] > > then > > JOBNO="${SLURM_ARRAY_JOB_ID}_${SLURM_ARRAY_TASK_ID}" > > else > > JOBNO="${SLURM_JOB_ID}" > > fi > > JI=${JOBNO//_*/} > > JWIDE=${#JI} > > JIDLEN=0 > > $((JIDLEN=JWIDE-3)) > > JDIR=/okyanus/SLURM/log/jobs/${JI:0:$JIDLEN}XXX > > echo > > > "===========================================================================" > > > &>>$JDIR/${JI}.txt > > scontrol show job -dd "$JOBNO" &>>$JDIR/${JI}.txt && echo > > > "===========================================================================" > > > >>$JDIR/${JI}.txt && scontrol write batch_script "$SLURM_JOBID" - > > >>$JDIR/${JI}.txt > > exit 0 > > > > Regards; > > > > Ahmet M. > > > > > > 23.04.2020 10:33 tarihinde Gestió Servidors yazdı: > >> > >> Hello, > >> > >> When a job is “pending” or “running”, with “scontrol show > >> jobid=#jobjumber” I can get some usefull information, but when the > >> job has finished, that command doesn’t return anything. For example, > >> if I run a “sacct” and I see that some jobs have finished with state > >> “FAILED”, how can I get detailed information from that job? > >> > >> Thanks. > >> > > > > -- E.M. (Em) Dragowsky, Ph.D. Research Computing -- UTech Case Western Reserve University (216) 368-0082 (currently forwarding to my cell phone)