[slurm-dev] Re: Why my slurm is running on only one node?

Lachlan Musicman Thu, 27 Jul 2017 18:01:39 -0700

Ok! Good, so the servers are there.

You should expect to see output from


srun -w go2 hostname

alternatively you should get a diff hostname if you run

srun --time=0-06:00 --mem=8gb "$@" --pty -u bash -i

for instance.

Try running some stress test with >1 node and #cpus>(#cpus on single node)
in request, that should show multiple nodes. Hopefully.

cheers
L.




------
"The antidote to apocalypticism is *apocalyptic civics*. Apocalyptic civics
is the insistence that we cannot ignore the truth, nor should we panic
about it. It is a shared consciousness that our institutions have failed
and our ecosystem is collapsing, yet we are still here — and we are
creative agents who can shape our destinies. Apocalyptic civics is the
conviction that the only way out is through, and the only way through is
together. "

*Greg Bloom* @greggish
https://twitter.com/greggish/status/873177525903609857

On 28 July 2017 at 10:57, 허웅 <[email protected]> wrote:

> Here is my output of sinfo
>
>
>
> [root@GO1]~# sinfo -N
>
> NODELIST   NODES PARTITION STATE
>
> sgo1           1    party* idle
>
> sgo2           1    party* idle
>
> sgo3           1    party* idle
>
> sgo4           1    party* idle
>
> sgo5           1    party* idle
>
> [root@GO1]~# sn
> Fri Jul 28 09:55:53 2017
>            HOSTNAMES
>                  GO1
>                  GO2
>                  GO3
>                  GO4
>                  GO5
>
>
>
> -----Original Message-----
> *From:* "Lachlan Musicman"<[email protected]>
> *To:* "slurm-dev"<[email protected]>;
> *Cc:*
> *Sent:* 2017-07-28 (금) 09:51:40
> *Subject:* [slurm-dev] Re: Why my slurm is running on only one node?
>
> Also - are the nodes up an running wrt SLURM? What is the output of :
>
> sinfo -N
>
> ?
>
> (fwiw, I really like the alias sn="sinfo -Nle -o "%.20n %.15C %.8O %.7t" |
> uniq" )
>
> cheers
> L.
>
> ------
> "The antidote to apocalypticism is *apocalyptic civics*. Apocalyptic
> civics is the insistence that we cannot ignore the truth, nor should we
> panic about it. It is a shared consciousness that our institutions have
> failed and our ecosystem is collapsing, yet we are still here — and we are
> creative agents who can shape our destinies. Apocalyptic civics is the
> conviction that the only way out is through, and the only way through is
> together. "
>
> *Greg Bloom* @greggish https://twitter.com/greggish/
> status/873177525903609857
>
> On 28 July 2017 at 10:47, Lachlan Musicman <[email protected]> wrote:
>
> I think it's because hostname is so undemanding.
>
> How many CPUs does each host have?
>
> You may need to use ((number of cpus per host) + 1) to see action on
> another node.
>
> You can try using stress-ng to test higher loads?
>
> https://www.cyberciti.biz/faq/stress-test-linux-unix-server-
> with-stress-ng/
>
> cheers
> L.
>
>
> ------
> "The antidote to apocalypticism is *apocalyptic civics*. Apocalyptic
> civics is the insistence that we cannot ignore the truth, nor should we
> panic about it. It is a shared consciousness that our institutions have
> failed and our ecosystem is collapsing, yet we are still here — and we are
> creative agents who can shape our destinies. Apocalyptic civics is the
> conviction that the only way out is through, and the only way through is
> together. "
>
> *Greg Bloom* @greggish https://twitter.com/greggish/s
> tatus/873177525903609857
>
> On 28 July 2017 at 10:28, 허웅 <[email protected]> wrote:
>
> I have 5 nodes include control node.
>
> and my nodes are looking like this
>
> Control Node : GO1
> Compute Nodes : GO[1-5]
>
> when i trying to allocate some job to multiple nodes, only one node works.
>
> example]
>
> $ srun -N5 hostname
> GO1
> GO1
> GO1
> GO1
> GO1
>
> even I expected like this
>
> $ srun -N5 hostname
> GO1
> GO2
> GO3
> GO4
> GO5
>
> What should i do?
>
> there are some my configures.
>
> $ scontrol show frontend
> FrontendName=GO1 State=IDLE Version=17.02 Reason=(null)
> BootTime=2017-06-02T20:14:39 SlurmdStartTime=2017-07-27T16:29:46
>
> FrontendName=GO2 State=IDLE Version=17.02 Reason=(null)
> BootTime=2017-07-05T17:54:13 SlurmdStartTime=2017-07-27T16:30:07
>
> FrontendName=GO3 State=IDLE Version=17.02 Reason=(null)
> BootTime=2017-07-05T17:22:58 SlurmdStartTime=2017-07-27T16:30:08
>
> FrontendName=GO4 State=IDLE Version=17.02 Reason=(null)
> BootTime=2017-07-05T17:21:40 SlurmdStartTime=2017-07-27T16:30:08
>
> FrontendName=GO5 State=IDLE Version=17.02 Reason=(null)
> BootTime=2017-07-05T17:21:39 SlurmdStartTime=2017-07-27T16:30:09
>
> $ scontrol ping
> Slurmctld(primary/backup) at GO1/(NULL) are UP/DOWN
>
> [slurm.conf]
> # slurm.conf
> #
> # See the slurm.conf man page for more information.
> #
> ClusterName=linux
> ControlMachine=GO1
> ControlAddr=192.168.30.74
> #
> SlurmUser=slurm
> SlurmctldPort=6817
> SlurmdPort=6818
> AuthType=auth/munge
> StateSaveLocation=/var/lib/slurmd
> SlurmdSpoolDir=/var/spool/slurmd
> SwitchType=switch/none
> MpiDefault=none
> SlurmctldPidFile=/var/run/slurmd/slurmctld.pid
> SlurmdPidFile=/var/run/slurmd/slurmd.pid
> ProctrackType=proctrack/pgid
> ReturnToService=0
> TreeWidth=50
> #
> # TIMERS
> SlurmctldTimeout=300
> SlurmdTimeout=300
> InactiveLimit=0
> MinJobAge=300
> KillWait=30
> Waittime=0
> #
> # SCHEDULING
> SchedulerType=sched/backfill
> FastSchedule=1
> #
> # LOGGING
> SlurmctldDebug=7
> SlurmctldLogFile=/var/log/slurmctld.log
> SlurmdDebug=7
> SlurmdLogFile=/var/log/slurmd.log
> JobCompType=jobcomp/none
> #
> # COMPUTE NODES
> NodeName=sgo[1-5] NodeHostName=GO[1-5] #NodeAddr=192.168.30.[74,141,68,70,72]
>
> #
> # PARTITIONS
> PartitionName=party Default=yes Nodes=ALL
>
>
>
>
>

[slurm-dev] Re: Why my slurm is running on only one node?

Reply via email to