In my reading of the Slurm documentation, it seems that exceeding the limits set in GrpTRESMins should result in terminating a running job. However, in testing this, The ‘current value’ of the GrpTRESMins only updates upon job completion and is not updated as the job progresses. Therefore jobs aren’t being stopped. On the positive side, no new jobs are started if the limit is exceeded. Here’s the documentation that is confusing me…..
If any limit is reached, all running jobs with that TRES in this group will be killed, and no new jobs will be allowed to run. Perhaps there is a setting or misconfiguration on my part. Thanks in advance!