Hi Hermann, >> So you both are happily(?) ignoring this warning the "Prolog and Epilog >> Guide", >> right? :-) >> >> "Prolog and Epilog scripts [...] should not call Slurm commands (e.g. squeue, >> scontrol, sacctmgr, etc)." > > We have probably been doing this since before the warning was added to > the documentation. So we are "ignorantly ignoring" the advice :-/
Same here :) But if $SLURM_JOB_STDOUT is not defined as documented … what can you do. >> May I ask how big your clusters are (number of nodes) and how heavily they >> are >> used (submitted jobs per hour)? We have around 500 nodes (mostly 2x18 cores). Jobs ending (i.e. calling the epilog script) varies quite a lot between 1000 and 15k a day, so something in between 40 and 625 Jobs/hour. During those peaks Slurm can become noticeably slower, however usually it runs fine. Sebastian > Am 16.09.2022 um 15:15 schrieb Loris Bennett <loris.benn...@fu-berlin.de>: > > Hi Hermann, > > Hermann Schwärzler <hermann.schwaerz...@uibk.ac.at > <mailto:hermann.schwaerz...@uibk.ac.at>> writes: > >> Hi Loris, >> hi Sebastian, >> >> thanks for the information on how you are doing this. >> So you both are happily(?) ignoring this warning the "Prolog and Epilog >> Guide", >> right? :-) >> >> "Prolog and Epilog scripts [...] should not call Slurm commands (e.g. squeue, >> scontrol, sacctmgr, etc)." > > We have probably been doing this since before the warning was added to > the documentation. So we are "ignorantly ignoring" the advice :-/ > >> May I ask how big your clusters are (number of nodes) and how heavily they >> are >> used (submitted jobs per hour)? > > We have around 190 32-core nodes. I don't know how I would easily find > out the average number of jobs per hour. The only problems we have had > with submission have been when people have written their own mechanisms > for submitting thousands of jobs. Once we get them to use job array, > such problems generally disappear. > > Cheers, > > Loris > >> Regards, >> Hermann >> >> On 9/16/22 9:09 AM, Loris Bennett wrote: >>> Hi Hermann, >>> Sebastian Potthoff <s.potth...@uni-muenster.de> writes: >>> >>>> Hi Hermann, >>>> >>>> I happened to read along this conversation and was just solving this issue >>>> today. I added this part to the epilog script to make it work: >>>> >>>> # Add job report to stdout >>>> StdOut=$(/usr/bin/scontrol show job=$SLURM_JOB_ID | /usr/bin/grep StdOut | >>>> /usr/bin/xargs | /usr/bin/awk 'BEGIN { FS = "=" } ; { print $2 }') >>>> >>>> NODELIST=($(/usr/bin/scontrol show hostnames)) >>>> >>>> # Only add to StdOut file if it exists and if we are the first node >>>> if [ "$(/usr/bin/hostname -s)" = "${NODELIST[0]}" -a ! -z "${StdOut}" ] >>>> then >>>> echo "################################# JOB REPORT >>>> ##################################" >> $StdOut >>>> /usr/bin/seff $SLURM_JOB_ID >> $StdOut >>>> echo >>>> "###############################################################################" >>>> >> $StdOut >>>> fi >>> We do something similar. At the end of our script pointed to by >>> EpilogSlurmctld we have >>> OUT=`scontrol show jobid ${job_id} | awk -F= '/ StdOut/{print $2}'` >>> if [ ! -f "$OUT" ]; then >>> exit >>> fi >>> printf "\n== Epilog Slurmctld >>> ==================================================\n\n" >> ${OUT} >>> seff ${SLURM_JOB_ID} >> ${OUT} >>> printf >>> "\n======================================================================\n" >>>>> ${OUT} >>> chown ${user} ${OUT} >>> Cheers, >>> Loris >>> >>>> Contrary to what it says in the slurm docs >>>> https://slurm.schedmd.com/prolog_epilog.html I was not able to use the >>>> env var SLURM_JOB_STDOUT, so I had to fetch it via scontrol. In addition I >>>> had to >>>> make sure it is only called by the „leading“ node as the epilog script >>>> will be called by ALL nodes of a multinode job and they would all call >>>> seff and clutter up the output. Last thing was to check if StdOut is >>>> not of length zero (i.e. it exists). Interactive jobs would otherwise >>>> cause the node to drain. >>>> >>>> Maybe this helps. >>>> >>>> Kind regards >>>> Sebastian >>>> >>>> PS: goslmailer looks quite nice with its recommendations! Will definitely >>>> look into it. >>>> >>>> -- >>>> Westfälische Wilhelms-Universität (WWU) Münster >>>> WWU IT >>>> Sebastian Potthoff (eScience / HPC) >>>> >>>> Am 15.09.2022 um 18:07 schrieb Hermann Schwärzler >>>> <hermann.schwaerz...@uibk.ac.at>: >>>> >>>> Hi Ole, >>>> >>>> On 9/15/22 5:21 PM, Ole Holm Nielsen wrote: >>>> >>>> On 15-09-2022 16:08, Hermann Schwärzler wrote: >>>> >>>> Just out of curiosity: how do you insert the output of seff into the >>>> out-file of a job? >>>> >>>> Use the "smail" tool from the slurm-contribs RPM and set this in >>>> slurm.conf: >>>> MailProg=/usr/bin/smail >>>> >>>> Maybe I am missing something but from what I can tell smail sends an >>>> email and does *not* change or append to the .out file of a job... >>>> >>>> Regards, >>>> Hermann >>> >> > -- > Dr. Loris Bennett (Herr/Mr) > ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de > <mailto:loris.benn...@fu-berlin.de>
smime.p7s
Description: S/MIME cryptographic signature