[slurm-users] Re: Problems with gres.conf

2024-06-04 Thread Patryk Bełzak via slurm-users
Hi, I believe that setting cores in gres.conf explicitly gives you better control over hardware configuration, I wouldn't trust slurm on that one. We have the gres.conf along with "Cores", all you have to do is proper Numa discovery (as long as your hardware has numa), and then assign correct co

[slurm-users] diagnosing why interactive/non-interactive job waits are so long with State=MIXED

2024-06-04 Thread Robert Kudyba via slurm-users
At the moment we have 2 nodes that are having long wait times. Generally this is when the nodes are fully allocated. What would be the other reasons if there is still enough available memory and CPU available, that a job would take so long? Slurm version is 23.02.4 via Bright Computing. Note the c

[slurm-users] Re: diagnosing why interactive/non-interactive job waits are so long with State=MIXED

2024-06-04 Thread Ryan Novosielski via slurm-users
This is relatively true of my system as well, and I believe it’s that the backfill schedule is slower than the main scheduler. -- #BlackLivesMatter || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu |

[slurm-users] Re: diagnosing why interactive/non-interactive job waits are so long with State=MIXED

2024-06-04 Thread Robert Kudyba via slurm-users
Thanks for the quick response Ryan! Are there any recommendations for bf_ options from https://slurm.schedmd.com/sched_config.html that could help with this? bf_continue? Decreasing bf_interval= to a value lower than 30? On Tue, Jun 4, 2024 at 4:13 PM Ryan Novosielski wrote: > This is relativel

[slurm-users] Re: diagnosing why interactive/non-interactive job waits are so long with State=MIXED

2024-06-04 Thread Ryan Novosielski via slurm-users
We do have bf_continue set. And also bf_max_job_user=50, because we discovered that one user can submit so many jobs that it will hit the limit of the number it’s going to consider and not run some jobs that it could otherwise run. On Jun 4, 2024, at 16:20, Robert Kudyba wrote: Thanks for the