Following up on this in case anyone can provide some insight, please. On Thu, May 16, 2024 at 8:32 AM Dan Healy <daniel.t.he...@gmail.com> wrote:
> Hi there, SLURM community, > > I swear I've done this before, but now it's failing on a new cluster I'm > deploying. We have 6 compute nodes with 64 cpu each (384 CPU total). When I > run `srun -n 500 hostname`, the task gets queued since there's not 500 > available CPU. > > Wasn't there an option that allows for this to be run where the first 384 > tasks execute, and then the remaining execute when resources free up? > > Here's my conf: > > # Slurm Cgroup Configs used on controllers and workersslurm_cgroup_config: > CgroupAutomount: yes ConstrainCores: yes ConstrainRAMSpace: yes > ConstrainSwapSpace: yes ConstrainDevices: yes# Slurm conf file > settingsslurm_config: AccountingStorageType: "accounting_storage/slurmdbd" > AccountingStorageEnforce: "limits" AuthAltTypes: "auth/jwt" ClusterName: > "cluster" AccountingStorageHost : "{{ > hostvars[groups['controller'][0]].ansible_hostname }}" DefMemPerCPU: 1024 > InactiveLimit: 120 JobAcctGatherType: "jobacct_gather/cgroup" JobCompType: > "jobcomp/none" MailProg: "/usr/bin/mail" MaxArraySize: 40000 MaxJobCount: > 100000 MinJobAge: 3600 ProctrackType: "proctrack/cgroup" ReturnToService: > 2 SelectType: "select/cons_tres" SelectTypeParameters: "CR_Core_Memory" > SlurmctldTimeout: 30 SlurmctldLogFile: "/var/log/slurm/slurmctld.log" > SlurmdLogFile: "/var/log/slurm/slurmd.log" SlurmdSpoolDir: > "/var/spool/slurm/d" SlurmUser: "{{ slurm_user.name }}" SrunPortRange: > "60000-61000" StateSaveLocation: "/var/spool/slurm/ctld" TaskPlugin: > "task/affinity,task/cgroup" UnkillableStepTimeout: 120 > > > -- > Thanks, > > Daniel Healy > -- Thanks, Daniel Healy
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com