[slurm-users] sprio prints an incomplete list of pending jobs

2020-09-16 Thread SJTU
Hi, I’m using spiro of SLURM 19.05 to inspect job queuing on my cluster. I found sprio prints an incomplete list of pending jobs, much less than ones from `squeue --state=pending` . No extra options seem to be available for sprio. I appreciate any suggestion. Thank you! Jianwen [root@

[slurm-users] SLURM launching jobs onto nodes with suspended jobs may lead to resource contention

2020-09-16 Thread SJTU
Hi, I am using SLURM 19.05 and found that SLURM may launch jobs onto nodes with suspended jobs, which leads to resource contention after the suspended jobs' restoration. Steps to reproduce this issue are: 1. Launch 40 one-core jobs on a 40-core compute node. 2. Suspend all 40 jobs on that comp

Re: [slurm-users] [Support] SLURM launching jobs onto nodes with suspended jobs may lead to resource contention

2020-09-16 Thread SJTU
d. > And issue the SIGSTOP or SIGCONT. > > Frankly I wish suspend didn't work like this. It should work where it > suspends the job and does not release the cpus but keeps them reserved. > That's the natural understanding of suspend, but that's not the way suspend

Re: [slurm-users] [Support] sprio prints an incomplete list of pending jobs

2020-09-16 Thread SJTU
The pending jobs missing in `sprio` output have been set priority manually before. I think that explains why they disappear. Best, Jianwen > On Sep 16, 2020, at 4:00 PM, SJTU wrote: > > Hi, > >I’m using spiro of SLURM 19.05 to inspect job queuing on my cluster. I >

[slurm-users] Mocking SLURM to debug job_submit.lua

2020-09-23 Thread SJTU
Hi, Modifying and testing job_submit.lua on a production SLURM system may lead to temporary failure of job submission, which halts new scheduling strategies being applied. Is it possible to mock a SLURM system to debug job_submit.lua so that it can be updated to the production system confident

[slurm-users] How to set association factor in Multifactor Priority

2020-09-23 Thread SJTU
Hi, I found that a new "Association Factor" is introduced in 19.05 to be part of Job_priority calculation. Can I set it for each SLURM account so job priority can be differentiated based on job accounts? https://groups.google.com/g/slurm-users/c/nzF8jOPZI_w/m/vj2wkUryBgAJ

[slurm-users] How does SLURM calculate StartTime for pending jobs

2020-10-10 Thread SJTU
Hi, `scontrol show jobid xxx` shows SLURM's estimation of StartTime for a pending job. I wonder where I can find the code implementation of StartTime . Thank you! Jianwen

[slurm-users] Limit usage outside reservation

2020-10-20 Thread SJTU
Hi, We reserved compute node resource on SLURM for specific users and hope they will make good use of it. But in some cases users forgot the '--reservation' parameter in job scripts, competing with other users outside the reserved nodes. Is there a recommended way to limit users' usage *OUTSIDE

[slurm-users] Raise the priority of a certain kind of jobs

2020-11-12 Thread SJTU
Hello, We want to raise the priority of a certain kind of slurm jobs. We considered doing it in Prolog, but Prolog seems to run only at job starting time so may not be useful for queued jobs. Is there any possible way to do this? Thank you and look forward to your reply. Best, Jianwen

[slurm-users] Set a ramdom offset when starting node health check in SLURM

2020-11-26 Thread SJTU
Hi, We uses HealthCheckProgram = /usr/sbin/nhc in slurm to check node health every 600 seconds. However, some NHC checks points to a same central resource thus starting these checks simultaneously may lead to false alarms of service degrade. Is it possible to set a random offset to when

[slurm-users] Probing CPU and memory usage via seff on running jobs

2020-11-29 Thread SJTU
Hi, Is it possible to probe CPU and memory usage via seff on running jobs? Thank you! Jianwen

[slurm-users] Insert separating characters into sacct formated output

2021-02-09 Thread SJTU
Hi, I am using SLURM 19.05.7 . Is it possible to insert user-defined separating characters like "|" or "," into sacct's formatted outputs? That would make it easier to parse fields. Thank you! Jianwen