Hi Alper, Maybe this is relevant to you:
*Can Slurm emulate nodes with more resources than physically exist on the node?* Yes. In the slurm.conf file, configure *SlurmdParameters=config_overrides* and specify any desired node resource specifications (*CPUs*, *Sockets*, *CoresPerSocket*, *ThreadsPerCore*, and/or *TmpDisk*). Slurm will use the resource specification for each node that is given in *slurm.conf* and will not check these specifications against those actually found on the node. The system would best be configured with *TaskPlugin=task/none*, so that launched tasks can run on any available CPU under operating system control. Best On Sat, Nov 13, 2021 at 4:10 AM Alper Alimoglu <alper.alimo...@gmail.com> wrote: > My goal is to set up a single server `slurm` cluster (only using a single > computer) that can run multiple jobs in parallel. > > In my node `nproc` returns 4 so I believe I can run 4 jobs in parallel if > they use a single core. In order to do it I run the controller and the > worker daemon on the same node. > When I submit four jobs at the same time, only one of them is able to run > and the other three are not able to run due to the following error: `queued > and waiting for resources`. > > I am using `Ubuntu 20.04.3 LTS"`. I have observe that this approach was > working on tag version `<=19`: > > ``` > $ git clone https://github.com/SchedMD/slurm ~/slurm && cd ~/slurm > $ git checkout e2e21cb571ce88a6dd52989ec6fe30da8c4ef15f #slurm-19-05-8-1` > $ ./configure --enable-debug --enable-front-end --enable-multiple-slurmd > $ sudo make && sudo make install > ``` > > but does not work on higher versions like `slurm 20.02.1` or its `master` > branch. > > ------ > > ``` > ❯ sinfo > Sat Nov 06 14:17:04 2021 > NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK > WEIGHT AVAIL_FE REASON > home1 1 debug* idle 1 1:1:1 1 0 > 1 (null) none > home2 1 debug* idle 1 1:1:1 1 0 > 1 (null) none > home3 1 debug* idle 1 1:1:1 1 0 > 1 (null) none > home4 1 debug* idle 1 1:1:1 1 0 > 1 (null) none > $ srun -N1 sleep 10 # runs > $ srun -N1 sleep 10 # queued and waiting for resources > $ srun -N1 sleep 10 # queued and waiting for resources > $ srun -N1 sleep 10 # queued and waiting for resources > ``` > > Here, I get lost where since its [emulate-mode][1] they should be able to > run in parallel. > > > They way I build from the source-code: > > ```bash > git clone https://github.com/SchedMD/slurm ~/slurm && cd ~/slurm > ./configure --enable-debug --enable-multiple-slurmd > make > sudo make install > ``` > > -------- > > ``` > $ hostname -s > home > $ nproc > 4 > ``` > > ##### Compute_node setup: > > ``` > NodeName=home[1-4] NodeHostName=home NodeAddr=127.0.0.1 CPUs=1 > ThreadsPerCore=1 Port=17001 > PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE State=UP > Shared=FORCE:1 > ``` > > I have also tried: `NodeHostName=localhost` > > `slurm.conf` file: > > ```bash > ControlMachine=home # $(hostname -s) > ControlAddr=127.0.0.1 > ClusterName=cluster > SlurmUser=alper > MailProg=/home/user/slurm_mail_prog.sh > MinJobAge=172800 # 48 h > SlurmdSpoolDir=/var/spool/slurmd > SlurmdLogFile=/var/log/slurm/slurmd.%n.log > SlurmdPidFile=/var/run/slurmd.%n.pid > AuthType=auth/munge > CryptoType=crypto/munge > MpiDefault=none > ProctrackType=proctrack/pgid > ReturnToService=1 > SlurmctldPidFile=/var/run/slurmctld.pid > SlurmdPort=6820 > SlurmctldPort=6821 > StateSaveLocation=/tmp/slurmstate > SwitchType=switch/none > TaskPlugin=task/none > InactiveLimit=0 > Waittime=0 > SchedulerType=sched/backfill > SelectType=select/linear > PriorityDecayHalfLife=0 > PriorityUsageResetPeriod=NONE > AccountingStorageEnforce=limits > AccountingStorageType=accounting_storage/slurmdbd > AccountingStoreFlags=YES > JobCompType=jobcomp/none > JobAcctGatherFrequency=30 > JobAcctGatherType=jobacct_gather/none > NodeName=home[1-2] NodeHostName=home NodeAddr=127.0.0.1 CPUs=2 > ThreadsPerCore=1 Port=17001 > PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE State=UP > Shared=FORCE:1 > ``` > > `slurmdbd.conf`: > > ```bash > AuthType=auth/munge > AuthInfo=/var/run/munge/munge.socket.2 > DbdAddr=localhost > DbdHost=localhost > SlurmUser=alper > DebugLevel=4 > LogFile=/var/log/slurm/slurmdbd.log > PidFile=/var/run/slurmdbd.pid > StorageType=accounting_storage/mysql > StorageUser=alper > StoragePass=12345 > ``` > > The way I run slurm: > > ``` > sudo /usr/local/sbin/slurmd > sudo /usr/local/sbin/slurmdbd & > sudo /usr/local/sbin/slurmctld -cDvvvvvv > ``` > --------- > > Related: > - minimum number of computers for a slurm cluster ( > https://stackoverflow.com/a/27788311/2402577) > - [Running multiple worker daemons SLURM]( > https://stackoverflow.com/a/40707189/2402577) > - https://stackoverflow.com/a/47009930/2402577 > > > [1]: https://slurm.schedmd.com/faq.html#multi_slurmd >