Ok! Good, so the servers are there. You should expect to see output from
srun -w go2 hostname alternatively you should get a diff hostname if you run srun --time=0-06:00 --mem=8gb "$@" --pty -u bash -i for instance. Try running some stress test with >1 node and #cpus>(#cpus on single node) in request, that should show multiple nodes. Hopefully. cheers L. ------ "The antidote to apocalypticism is *apocalyptic civics*. Apocalyptic civics is the insistence that we cannot ignore the truth, nor should we panic about it. It is a shared consciousness that our institutions have failed and our ecosystem is collapsing, yet we are still here — and we are creative agents who can shape our destinies. Apocalyptic civics is the conviction that the only way out is through, and the only way through is together. " *Greg Bloom* @greggish https://twitter.com/greggish/status/873177525903609857 On 28 July 2017 at 10:57, 허웅 <[email protected]> wrote: > Here is my output of sinfo > > > > [root@GO1]~# sinfo -N > > NODELIST NODES PARTITION STATE > > sgo1 1 party* idle > > sgo2 1 party* idle > > sgo3 1 party* idle > > sgo4 1 party* idle > > sgo5 1 party* idle > > [root@GO1]~# sn > Fri Jul 28 09:55:53 2017 > HOSTNAMES > GO1 > GO2 > GO3 > GO4 > GO5 > > > > -----Original Message----- > *From:* "Lachlan Musicman"<[email protected]> > *To:* "slurm-dev"<[email protected]>; > *Cc:* > *Sent:* 2017-07-28 (금) 09:51:40 > *Subject:* [slurm-dev] Re: Why my slurm is running on only one node? > > Also - are the nodes up an running wrt SLURM? What is the output of : > > sinfo -N > > ? > > (fwiw, I really like the alias sn="sinfo -Nle -o "%.20n %.15C %.8O %.7t" | > uniq" ) > > cheers > L. > > ------ > "The antidote to apocalypticism is *apocalyptic civics*. Apocalyptic > civics is the insistence that we cannot ignore the truth, nor should we > panic about it. It is a shared consciousness that our institutions have > failed and our ecosystem is collapsing, yet we are still here — and we are > creative agents who can shape our destinies. Apocalyptic civics is the > conviction that the only way out is through, and the only way through is > together. " > > *Greg Bloom* @greggish https://twitter.com/greggish/ > status/873177525903609857 > > On 28 July 2017 at 10:47, Lachlan Musicman <[email protected]> wrote: > > I think it's because hostname is so undemanding. > > How many CPUs does each host have? > > You may need to use ((number of cpus per host) + 1) to see action on > another node. > > You can try using stress-ng to test higher loads? > > https://www.cyberciti.biz/faq/stress-test-linux-unix-server- > with-stress-ng/ > > cheers > L. > > > ------ > "The antidote to apocalypticism is *apocalyptic civics*. Apocalyptic > civics is the insistence that we cannot ignore the truth, nor should we > panic about it. It is a shared consciousness that our institutions have > failed and our ecosystem is collapsing, yet we are still here — and we are > creative agents who can shape our destinies. Apocalyptic civics is the > conviction that the only way out is through, and the only way through is > together. " > > *Greg Bloom* @greggish https://twitter.com/greggish/s > tatus/873177525903609857 > > On 28 July 2017 at 10:28, 허웅 <[email protected]> wrote: > > I have 5 nodes include control node. > > and my nodes are looking like this > > Control Node : GO1 > Compute Nodes : GO[1-5] > > when i trying to allocate some job to multiple nodes, only one node works. > > example] > > $ srun -N5 hostname > GO1 > GO1 > GO1 > GO1 > GO1 > > even I expected like this > > $ srun -N5 hostname > GO1 > GO2 > GO3 > GO4 > GO5 > > What should i do? > > there are some my configures. > > $ scontrol show frontend > FrontendName=GO1 State=IDLE Version=17.02 Reason=(null) > BootTime=2017-06-02T20:14:39 SlurmdStartTime=2017-07-27T16:29:46 > > FrontendName=GO2 State=IDLE Version=17.02 Reason=(null) > BootTime=2017-07-05T17:54:13 SlurmdStartTime=2017-07-27T16:30:07 > > FrontendName=GO3 State=IDLE Version=17.02 Reason=(null) > BootTime=2017-07-05T17:22:58 SlurmdStartTime=2017-07-27T16:30:08 > > FrontendName=GO4 State=IDLE Version=17.02 Reason=(null) > BootTime=2017-07-05T17:21:40 SlurmdStartTime=2017-07-27T16:30:08 > > FrontendName=GO5 State=IDLE Version=17.02 Reason=(null) > BootTime=2017-07-05T17:21:39 SlurmdStartTime=2017-07-27T16:30:09 > > $ scontrol ping > Slurmctld(primary/backup) at GO1/(NULL) are UP/DOWN > > [slurm.conf] > # slurm.conf > # > # See the slurm.conf man page for more information. > # > ClusterName=linux > ControlMachine=GO1 > ControlAddr=192.168.30.74 > # > SlurmUser=slurm > SlurmctldPort=6817 > SlurmdPort=6818 > AuthType=auth/munge > StateSaveLocation=/var/lib/slurmd > SlurmdSpoolDir=/var/spool/slurmd > SwitchType=switch/none > MpiDefault=none > SlurmctldPidFile=/var/run/slurmd/slurmctld.pid > SlurmdPidFile=/var/run/slurmd/slurmd.pid > ProctrackType=proctrack/pgid > ReturnToService=0 > TreeWidth=50 > # > # TIMERS > SlurmctldTimeout=300 > SlurmdTimeout=300 > InactiveLimit=0 > MinJobAge=300 > KillWait=30 > Waittime=0 > # > # SCHEDULING > SchedulerType=sched/backfill > FastSchedule=1 > # > # LOGGING > SlurmctldDebug=7 > SlurmctldLogFile=/var/log/slurmctld.log > SlurmdDebug=7 > SlurmdLogFile=/var/log/slurmd.log > JobCompType=jobcomp/none > # > # COMPUTE NODES > NodeName=sgo[1-5] NodeHostName=GO[1-5] #NodeAddr=192.168.30.[74,141,68,70,72] > > # > # PARTITIONS > PartitionName=party Default=yes Nodes=ALL > > > > >
