Following up on this in case anyone can provide some insight, please.

On Thu, May 16, 2024 at 8:32 AM Dan Healy <daniel.t.he...@gmail.com> wrote:

> Hi there, SLURM community,
>
> I swear I've done this before, but now it's failing on a new cluster I'm
> deploying. We have 6 compute nodes with 64 cpu each (384 CPU total). When I
> run `srun -n 500 hostname`, the task gets queued since there's not 500
> available CPU.
>
> Wasn't there an option that allows for this to be run where the first 384
> tasks execute, and then the remaining execute when resources free up?
>
> Here's my conf:
>
> # Slurm Cgroup Configs used on controllers and workersslurm_cgroup_config:  
> CgroupAutomount: yes  ConstrainCores: yes  ConstrainRAMSpace: yes  
> ConstrainSwapSpace: yes  ConstrainDevices: yes# Slurm conf file 
> settingsslurm_config:  AccountingStorageType: "accounting_storage/slurmdbd"  
> AccountingStorageEnforce: "limits"  AuthAltTypes: "auth/jwt"  ClusterName: 
> "cluster"  AccountingStorageHost : "{{ 
> hostvars[groups['controller'][0]].ansible_hostname }}"  DefMemPerCPU: 1024  
> InactiveLimit: 120  JobAcctGatherType: "jobacct_gather/cgroup"  JobCompType: 
> "jobcomp/none"  MailProg: "/usr/bin/mail"  MaxArraySize: 40000  MaxJobCount: 
> 100000  MinJobAge: 3600  ProctrackType: "proctrack/cgroup"  ReturnToService: 
> 2  SelectType: "select/cons_tres"  SelectTypeParameters: "CR_Core_Memory"  
> SlurmctldTimeout: 30  SlurmctldLogFile: "/var/log/slurm/slurmctld.log"  
> SlurmdLogFile: "/var/log/slurm/slurmd.log"  SlurmdSpoolDir: 
> "/var/spool/slurm/d"  SlurmUser: "{{ slurm_user.name }}"  SrunPortRange: 
> "60000-61000"  StateSaveLocation: "/var/spool/slurm/ctld"  TaskPlugin: 
> "task/affinity,task/cgroup"  UnkillableStepTimeout: 120
>
>
> --
> Thanks,
>
> Daniel Healy
>


-- 
Thanks,

Daniel Healy
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to