Re: [slurm-users] Longer queuing times for larger jobs

2020-02-05 Thread Antony Cleave
Hi, from what you are describing it sounds like jobs are backfilling in front and stopping the large jobs from starting You probably need to tweak your backfill window in schedulerparameters in slurm.conf see here *bf_window=#*The number of minutes into the future to look when considering jobs to

[slurm-users] How to use Autodetect=nvml in gres.conf

2020-02-05 Thread Dean Schulze
I need to dynamically configure gpus on my nodes. The gres.conf doc says to use Autodetect=nvml in gres.conf instead of adding configuration details to each gpu in gres.conf. The docs aren't really clear about this because they show an example with the details for each gpu: AutoDetect=nvml Nam

[slurm-users] Oversubscribe partition not oversubscribing

2020-02-05 Thread Matthew Brown
*I apologize if this comes up as a repost of my message about a week ago. I think I had not officially joined the group when I first posted and perhaps sent to the wrong email address.* I'm trying to setup a small partition where oversubscription is allowed. I want to be able to have several jobs

Re: [slurm-users] Longer queuing times for larger jobs

2020-02-05 Thread Loris Bennett
Hello David, David Baker writes: > Hello, > > I've taken a very good look at our cluster, however as yet not made > any significant changes. The one change that I did make was to > increase the "jobsizeweight". That's now our dominant parameter and it > does ensure that our largest jobs (> 20 no

Re: [slurm-users] Limits to partitions for users groups

2020-02-05 Thread Renfro, Michael
If you want to rigidly define which 20 nodes are available to the one group of users, you could define a 20-node partition for them, and a 35-node partition for the priority group, and restrict access by Unix group membership: PartitionName=restricted Nodes=node0[01-20] AllowGroups=ALL Partition

[slurm-users] Limits to partitions for users groups

2020-02-05 Thread Рачко Антон Сергеевич
I have partition with 35 nodes. Many users use it, but one group of them have more priority than others. I want to set limit of max. 20 nodes for any users and allow use all nodes for users in priority group. I can split this partition to 2: 20-node partition for all and 15-node for priority gro

Re: [slurm-users] sbatch script won't accept --gres that requires more than 1 gpu

2020-02-05 Thread Marcus Wagner
I had this same issue today again. sbatch: error: CPU count per node can not be satisfied sbatch: error: Batch job submission failed: Requested node configuration is not available After restarting slurmctld, the user could submit his job with the very same jobscript. One of the oddities o