Hi, Today I spotted a job which requested an entire node, then had to wait four around 16 hours and finally ran, apparently successfully, for less than 4 minutes.
As it currently seems in general fashionable for users round here to request the maximum number of cores available on a node without doing any scaling experiments or considering backfill, it seems like it would be a good idea to provide them with some feed back on wait/run times. One option would be to write the information into the Slurm 'out' file (currently we insert the output of 'seff). Another option would be to aggregate the times over, say, a month and provide a the absolute totals and maybe a run-to-wait ratio. Has anyone already done anything like this? Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de