Hi Loris,
hi Sebastian,

thanks for the information on how you are doing this.
So you both are happily(?) ignoring this warning the "Prolog and Epilog Guide", right? :-)

"Prolog and Epilog scripts [...] should not call Slurm commands (e.g. squeue, scontrol, sacctmgr, etc)."

May I ask how big your clusters are (number of nodes) and how heavily they are used (submitted jobs per hour)?

Regards,
Hermann

On 9/16/22 9:09 AM, Loris Bennett wrote:
Hi Hermann,

Sebastian Potthoff <s.potth...@uni-muenster.de> writes:

Hi Hermann,

I happened to read along this conversation and was just solving this issue 
today. I added this part to the epilog script to make it work:

# Add job report to stdout
StdOut=$(/usr/bin/scontrol show job=$SLURM_JOB_ID | /usr/bin/grep StdOut | /usr/bin/xargs 
| /usr/bin/awk 'BEGIN { FS = "=" } ; { print $2 }')

NODELIST=($(/usr/bin/scontrol show hostnames))

# Only add to StdOut file if it exists and if we are the first node
if [ "$(/usr/bin/hostname -s)" = "${NODELIST[0]}" -a ! -z "${StdOut}" ]
then
   echo "################################# JOB REPORT 
##################################" >> $StdOut
   /usr/bin/seff $SLURM_JOB_ID >> $StdOut
   echo 
"###############################################################################" 
>> $StdOut
fi

We do something similar.  At the end of our script pointed to by
EpilogSlurmctld we have

   OUT=`scontrol show jobid ${job_id} | awk -F= '/ StdOut/{print $2}'`
   if [ ! -f "$OUT" ]; then
     exit
   fi

   printf "\n== Epilog Slurmctld 
==================================================\n\n" >>  ${OUT}

   seff ${SLURM_JOB_ID} >> ${OUT}

   printf 
"\n======================================================================\n" >> 
 ${OUT}

   chown ${user} ${OUT}

Cheers,

Loris

   Contrary to what it says in the slurm docs 
https://slurm.schedmd.com/prolog_epilog.html  I was not able to use the env var 
SLURM_JOB_STDOUT, so I had to fetch it via scontrol. In addition I had to
make sure it is only called by the „leading“ node as the epilog script will be 
called by ALL nodes of a multinode job and they would all call seff and clutter 
up the output. Last thing was to check if StdOut is
not of length zero (i.e. it exists). Interactive jobs would otherwise cause the 
node to drain.

Maybe this helps.

Kind regards
Sebastian

PS: goslmailer looks quite nice with its recommendations! Will definitely look 
into it.

--
Westfälische Wilhelms-Universität (WWU) Münster
WWU IT
Sebastian Potthoff (eScience / HPC)

  Am 15.09.2022 um 18:07 schrieb Hermann Schwärzler 
<hermann.schwaerz...@uibk.ac.at>:

  Hi Ole,

  On 9/15/22 5:21 PM, Ole Holm Nielsen wrote:

  On 15-09-2022 16:08, Hermann Schwärzler wrote:

  Just out of curiosity: how do you insert the output of seff into the out-file 
of a job?

  Use the "smail" tool from the slurm-contribs RPM and set this in slurm.conf:
  MailProg=/usr/bin/smail

  Maybe I am missing something but from what I can tell smail sends an email 
and does *not* change or append to the .out file of a job...

  Regards,
  Hermann


Reply via email to