Re: [slurm-users] Limit run time of interactive jobs
Hi, Bjørn-Helge Mevik writes: > Wouldn't it be simpler to just refuse too long interactive jobs in > job_submit.lua? Yes, I guess. I proposed the idea of having different partitions because then the constraints are at the level of the partition, which is probably easier to handle than modifying the job_submit.lua script?, but probably can get the same result both ways. Anyway, the goal was to point to the job_submit.lua stuff, because without it, even if creating separate partitions for batch and interactive jobs, it is not possible (or at least I wouldn't know how) to force a certain policy only for interactive jobs. Cheers, -- Ángel de Vicente Research Software Engineer (Supercomputing and BigData) Tel.: +34 922-605-747 Web.: http://research.iac.es/proyecto/polmag/ GPG: 0x8BDC390B69033F52 smime.p7s Description: S/MIME cryptographic signature
[slurm-users] New future and roadmap for Slurm-web
Hi Slurm community, Slurm-web is an open source web interface for Slurm workload manager : http://rackslab.github.io/slurm-web/ The project was born in 2015(*), it was originally funded by EDF [2] (huge thanks to them!) and it reached a nice and unique feature set with versions 2.x. Unfortunately, the software has suffered during the last years from lowered maintenance and investment. Today, Slurm-web is being endorsed by Rackslab[3], a small company focused on development of open source solutions for HPC operations, which becomes its new official maintainer. A new ambitious roadmap has been defined with long-term vision about this project, starting with version 3.0 coming later this year. In addition to existing Slurm-web feature set, the following new features are planned: - Near real-time updates of the dashboard - Accounting reports and vizualisation on past jobs - Built-in metrics about jobs and scheduling - Job submission and inspection - Vastly improved Gantt view - GPGPU support - QOS, associations and reservations management - Native RPM/deb packages and containers for easy deployment on most Linux distributions The software architecture will be reworked with modern established technologies, it will notably be based on reference slurmrestd REST API. The source code will remain free, published under GPLv3, in conformity with Rackslab commitment for free software community. Our goal is clearly to build the reference open source web interface for all users of Slurm based HPC clusters. More details about the roadmap has been published in project discussions on Github: https://github.com/rackslab/slurm-web/discussions/235 You are more than welcome to discuss about it there, ask questions and give comments! Best regards, (*) The original announcement can still be found in the archives of this mailing-list! [1] [1] https://groups.google.com/g/slurm-users/c/LiD2Pa8r22A/m/fDHWm5GomJsJ [2] https://www.edf.fr/en [3] https://rackslab.io -- Rémi Palancher Rackslab: Open Source Solutions for HPC Operations https://rackslab.io
Re: [slurm-users] Limit run time of interactive jobs
On 5/8/23 08:39, Bjørn-Helge Mevik wrote: Angel de Vicente writes: But one possible way to something similar is to have a partition only for interactive jobs and a different partition for batch jobs, and then enforce that each job uses the right partition. In order to do this, I think we can use the Lua contrib module (check the job_submit.lua example). Wouldn't it be simpler to just refuse too long interactive jobs in job_submit.lua? This sounds like a good idea, but how would one identify an interactive job in the job_submit.lua script? A solution was suggested in https://serverfault.com/questions/1090689/how-can-i-set-up-interactive-job-only-or-batch-job-only-partition-on-a-slurm-clu Interactive jobs have no script and job_desc.script will be empty / not set. So maybe something like this code snippet? if job_desc.script == NIL then -- This is an interactive job -- make checks of job timelimit if job_desc.time_limit > 3600 then slurm.log_user("NOTICE: Interactive jobs are limited to 3600 seconds") -- ESLURM_INVALID_TIME_LIMIT in slurm_errno.h return 2051 end end /Ole
Re: [slurm-users] Limit run time of interactive jobs
Ole Holm Nielsen writes: > On 5/8/23 08:39, Bjørn-Helge Mevik wrote: >> Angel de Vicente writes: >> >>> But one possible way to something similar is to have a partition only >>> for interactive jobs and a different partition for batch jobs, and then >>> enforce that each job uses the right partition. In order to do this, I >>> think we can use the Lua contrib module (check the job_submit.lua >>> example). >> Wouldn't it be simpler to just refuse too long interactive jobs in >> job_submit.lua? > > This sounds like a good idea, but how would one identify an > interactive job in the job_submit.lua script? Good question. :) I merely guessed it is possible. :) > A solution was suggested in > https://serverfault.com/questions/1090689/how-can-i-set-up-interactive-job-only-or-batch-job-only-partition-on-a-slurm-clu >> Interactive jobs have no script and job_desc.script will be empty / > not set. > > So maybe something like this code snippet? > > if job_desc.script == NIL then That sounds like it should work, yes. (But perhaps double check that jobs submitted with "sbatch --wrap" or taking the job script from stdin (if that is still possible) get job_descr.script set.) -- B/H signature.asc Description: PGP signature
Re: [slurm-users] Best way to accurately calculate the CPU usage of an account when using fairshare?
I would recommend standing up an instance of XDMod as it handles most of this for you in its summary reports. https://open.xdmod.org/10.0/index.html -Paul Edmon- On 5/3/23 2:05 PM, Joseph Francisco Guzman wrote: Good morning, We have at least one billed account right now, where the associated researchers are able to submit jobs that run against our normal queue with fairshare, but not for an academic research purpose. So we'd like to accurately calculate their CPU hours. We are currently using a script to query the db with sacct and sum up the value of ElapsedRaw * AllocCPUS for all jobs. But this seems limited, because requeueing will create what the sacct man page calls duplicates. By default jobs normally get requeued only if there's something outside of the user's control like a NODE_FAIL or an scontrol command to requeue it manually, though I think users can requeue things themselves, it's not a feature we've seen our researchers use. However with the new scrontab feature, whenever the cron is executed more than once, sacct reports that the previous jobs are "requeued" and are only visible by looking up duplicates. I haven't seen any billed account use requeueing or scrontab yet, but it's clear to me that it could be significant once researchers start using scrontab more. Scrontab has existed since one of the releases from 2020 I believe, but we enabled it this year and see it as much more powerful than the traditional linux crontab. What would be the best way to more thoroughly calculate ElapsedRaw * AllocCPUS, to account for duplicates, but optionally ignore unintentional requeueing like from a NODE_FAIL? Here's the main loop of the simple bash script I have now: while IFS='|' read -r end elapsed cpus; do # if a job crosses the month barrier # the entire bill will be put under the 2nd month year_month="${end:0:7}" if [[ ! "$elapsed" =~ ^[0-9]+$ ]] || [[ ! "$cpus" =~ ^[0-9]+$ ]]; then continue fi core_seconds["$year_month"]=$(( core_seconds["$year_month"] + (elapsed * cpus) )) done < <(sacct -a -A "$SLURM_ACCOUNT" \ -S "$START_DATE" \ -E "$END_DATE" \ -o End,ElapsedRaw,AllocCPUS -X -P --noheader) Our slurmdbd is configured to keep 6 months of data. It make senses to loop through the jobids instead, using sacct's -D/--duplicates option each time to reveal the hidden duplicates in the REQUEUED state, but I'm interested if there are alternatives or if I'm missing anything here. Thanks, Joseph -- Joseph F. Guzman - ITS (Advanced Research Computing) Northern Arizona University joseph.f.guz...@nau.edu
Re: [slurm-users] Limit run time of interactive jobs
Hello, Bjørn-Helge Mevik writes: >> A solution was suggested in >> https://serverfault.com/questions/1090689/how-can-i-set-up-interactive-job-only-or-batch-job-only-partition-on-a-slurm-clu >>> Interactive jobs have no script and job_desc.script will be empty / >> not set. >> >> So maybe something like this code snippet? >> >> if job_desc.script == NIL then In my case (merely a variation from some older post here at slurm-users), I'm using the following to make sure jobs go to the right queue (either 'batch' or 'interactive'), and it seems to work just fine: if (job_desc.script == nil or job_desc.script == '') then if (job_desc.partition ~= interactive_partition) then job_desc.partition = interactive_partition slurm.log_user("%s: normal job seems to be interactive, moved to %s partition.", log_prefix, job_desc.partition) end else if (job_desc.partition == interactive_partition) then job_desc.partition = batch_partition slurm.log_user("%s: batch jobs cannot be run in the interactive partition, moved to %s partition.", log_prefix, job_desc.partition) end end -- Ángel de Vicente Research Software Engineer (Supercomputing and BigData) Tel.: +34 922-605-747 Web.: http://research.iac.es/proyecto/polmag/ GPG: 0x8BDC390B69033F52 smime.p7s Description: S/MIME cryptographic signature