Hi,
I believe that setting cores in gres.conf explicitly gives you better control
over hardware configuration, I wouldn't trust slurm on that one.
We have the gres.conf along with "Cores", all you have to do is proper Numa
discovery (as long as your hardware has numa), and then assign correct co
At the moment we have 2 nodes that are having long wait times. Generally
this is when the nodes are fully allocated. What would be the other reasons
if there is still enough available memory and CPU available, that a
job would take so long? Slurm version is 23.02.4 via Bright Computing.
Note the c
This is relatively true of my system as well, and I believe it’s that the
backfill schedule is slower than the main scheduler.
--
#BlackLivesMatter
|| \\UTGERS, |---*O*---
||_// the State | Ryan Novosielski - novos...@rutgers.edu
|
Thanks for the quick response Ryan!
Are there any recommendations for bf_ options from
https://slurm.schedmd.com/sched_config.html that could help with this?
bf_continue? Decreasing bf_interval= to a value lower than 30?
On Tue, Jun 4, 2024 at 4:13 PM Ryan Novosielski
wrote:
> This is relativel
We do have bf_continue set. And also bf_max_job_user=50, because we discovered
that one user can submit so many jobs that it will hit the limit of the number
it’s going to consider and not run some jobs that it could otherwise run.
On Jun 4, 2024, at 16:20, Robert Kudyba wrote:
Thanks for the