Hey, folks. Some of my users submit job after job with no recognition of our 1000 CPU-day TRES limit, and thus their later jobs get blocked with the reason AssocGrpCPURunMinutesLimit.
I’ve written up a script [1] using Ole Holm Nielsen’s showuserlimits script [2] that will identify a user’s smallest-resource blocked job, and to predict when that job might run at current resource consumption rates. Non-root users will query about their blocked jobs, and root can query about anyone’s. Example runs: ===== # guessblockedjobstart someusername Next blocked job to run should be 551294, with 188160 CPU-minute(s) requested - Limit for running and queued jobs is 1440000 CPU-minutes - Running and pending jobs have 1364937 CPU-minutes remaining - Leaving 75063 CPU-minutes available currently - Smallest blocked job, 551294, requested 188160 CPU-minutes (14 CPU(s) on 1 node(s) for 13440 minute(s)) - Currently-running jobs release 7560 CPU-minutes per hour of elapsed time Estimated time for job 551294 to enter queue is Fri Jan 24 07:14 CST 2020, if resources are available # guessblockedjobstart anotherusername User anotherusername has no blocked jobs ===== Let me know if there any questions or problems found. Thanks. [1] https://gist.github.com/mikerenfro/4d21fee5cd6c82b16e30c46fb2bf3226 [2] https://github.com/OleHolmNielsen/Slurm_tools/tree/master/showuserlimits -- Mike Renfro, PhD / HPC Systems Administrator, Information Technology Services 931 372-3601 / Tennessee Tech University