[slurm-dev] Re: Why my slurm is running on only one node?

Lachlan Musicman Thu, 27 Jul 2017 17:48:16 -0700

I think it's because hostname is so undemanding.

How many CPUs does each host have?


You may need to use ((number of cpus per host) + 1) to see action on
another node.

You can try using stress-ng to test higher loads?

https://www.cyberciti.biz/faq/stress-test-linux-unix-server-with-stress-ng/

cheers
L.


------
"The antidote to apocalypticism is *apocalyptic civics*. Apocalyptic civics
is the insistence that we cannot ignore the truth, nor should we panic
about it. It is a shared consciousness that our institutions have failed
and our ecosystem is collapsing, yet we are still here — and we are
creative agents who can shape our destinies. Apocalyptic civics is the
conviction that the only way out is through, and the only way through is
together. "

*Greg Bloom* @greggish
https://twitter.com/greggish/status/873177525903609857

On 28 July 2017 at 10:28, 허웅 <[email protected]> wrote:

> I have 5 nodes include control node.
>
> and my nodes are looking like this
>
> Control Node : GO1
> Compute Nodes : GO[1-5]
>
> when i trying to allocate some job to multiple nodes, only one node works.
>
> example]
>
> $ srun -N5 hostname
> GO1
> GO1
> GO1
> GO1
> GO1
>
> even I expected like this
>
> $ srun -N5 hostname
> GO1
> GO2
> GO3
> GO4
> GO5
>
> What should i do?
>
> there are some my configures.
>
> $ scontrol show frontend
> FrontendName=GO1 State=IDLE Version=17.02 Reason=(null)
> BootTime=2017-06-02T20:14:39 SlurmdStartTime=2017-07-27T16:29:46
>
> FrontendName=GO2 State=IDLE Version=17.02 Reason=(null)
> BootTime=2017-07-05T17:54:13 SlurmdStartTime=2017-07-27T16:30:07
>
> FrontendName=GO3 State=IDLE Version=17.02 Reason=(null)
> BootTime=2017-07-05T17:22:58 SlurmdStartTime=2017-07-27T16:30:08
>
> FrontendName=GO4 State=IDLE Version=17.02 Reason=(null)
> BootTime=2017-07-05T17:21:40 SlurmdStartTime=2017-07-27T16:30:08
>
> FrontendName=GO5 State=IDLE Version=17.02 Reason=(null)
> BootTime=2017-07-05T17:21:39 SlurmdStartTime=2017-07-27T16:30:09
>
> $ scontrol ping
> Slurmctld(primary/backup) at GO1/(NULL) are UP/DOWN
>
> [slurm.conf]
> # slurm.conf
> #
> # See the slurm.conf man page for more information.
> #
> ClusterName=linux
> ControlMachine=GO1
> ControlAddr=192.168.30.74
> #
> SlurmUser=slurm
> SlurmctldPort=6817
> SlurmdPort=6818
> AuthType=auth/munge
> StateSaveLocation=/var/lib/slurmd
> SlurmdSpoolDir=/var/spool/slurmd
> SwitchType=switch/none
> MpiDefault=none
> SlurmctldPidFile=/var/run/slurmd/slurmctld.pid
> SlurmdPidFile=/var/run/slurmd/slurmd.pid
> ProctrackType=proctrack/pgid
> ReturnToService=0
> TreeWidth=50
> #
> # TIMERS
> SlurmctldTimeout=300
> SlurmdTimeout=300
> InactiveLimit=0
> MinJobAge=300
> KillWait=30
> Waittime=0
> #
> # SCHEDULING
> SchedulerType=sched/backfill
> FastSchedule=1
> #
> # LOGGING
> SlurmctldDebug=7
> SlurmctldLogFile=/var/log/slurmctld.log
> SlurmdDebug=7
> SlurmdLogFile=/var/log/slurmd.log
> JobCompType=jobcomp/none
> #
> # COMPUTE NODES
> NodeName=sgo[1-5] NodeHostName=GO[1-5] #NodeAddr=192.168.30.[74,141,68,70,72]
>
> #
> # PARTITIONS
> PartitionName=party Default=yes Nodes=ALL
>

[slurm-dev] Re: Why my slurm is running on only one node?

Reply via email to