Ronan, as far as I can see this means that you cannot launch a job. What state are the compute nodes in when you run sinfo?
On 17 July 2018 at 10:08, Buckley, Ronan <ronan.buck...@dell.com> wrote: > Yes, srun just hangs. Commands like sinfo and squeue run fine. > > I also have no slurm logs in /var/log ?? > > > > *From:* slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] *On > Behalf Of *John Hearns > *Sent:* Tuesday, July 17, 2018 8:57 AM > > *To:* Slurm User Community List > *Subject:* Re: [slurm-users] 'srun hostname' hangs on the command line > > > > Ronan, sorry to ask but this is a bit unclear. > > > > Are you unable to launch ANY sessions with srun? > > In which case you need to look at the logs to see why the job is not being > scheduled. > > > > Is it only the hostname command which fails? > > > > I would guess very much you have already run an ssh into a node and run > the hostname command manually. > > > > > > > > On 17 July 2018 at 09:50, Buckley, Ronan <ronan.buck...@dell.com> wrote: > > Yes I do. > > > > *From:* slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] *On > Behalf Of *Williams, Gareth (IM&T, Clayton) > *Sent:* Tuesday, July 17, 2018 12:33 AM > *To:* Slurm User Community List > *Subject:* Re: [slurm-users] 'srun hostname' hangs on the command line > > > > Do you get the same problem as a non-root user? > > > > *From:* slurm-users [mailto:slurm-users-boun...@lists.schedmd.com > <slurm-users-boun...@lists.schedmd.com>] *On Behalf Of *Buckley, Ronan > *Sent:* Tuesday, 17 July 2018 12:53 AM > *To:* slurm-users@lists.schedmd.com > *Subject:* [slurm-users] 'srun hostname' hangs on the command line > > > > Hi All, > > > > Verbose mode doesn’t show much. > > I hashed out the hostnames. > > Any ideas/suggestions? > > > > *# srun hostname* > > *^Csrun: interrupt (one more within 1 sec to abort)* > > *srun: task 0: unknown* > > *^Z* > > *[1]+ Stopped srun hostname* > > *#* > > > > *# srun -v hostname* > > *srun: defined options for program `srun'* > > *srun: --------------- ---------------------* > > *srun: user : `root'* > > *srun: uid : 0* > > *srun: gid : 0* > > *srun: cwd : /root* > > *srun: ntasks : 1 (default)* > > *srun: nodes : 1 (default)* > > *srun: jobid : 4294967294 (default)* > > *srun: partition : default* > > *srun: profile : `NotSet'* > > *srun: job name : `(null)'* > > *srun: reservation : `(null)'* > > *srun: burst_buffer : `(null)'* > > *srun: wckey : `(null)'* > > *srun: cpu_freq_min : 4294967294* > > *srun: cpu_freq_max : 4294967294* > > *srun: cpu_freq_gov : 4294967294* > > *srun: switches : -1* > > *srun: wait-for-switches : -1* > > *srun: distribution : unknown* > > *srun: cpu_bind : default (0)* > > *srun: mem_bind : default (0)* > > *srun: verbose : 1* > > *srun: slurmd_debug : 0* > > *srun: immediate : false* > > *srun: label output : false* > > *srun: unbuffered IO : false* > > *srun: overcommit : false* > > *srun: threads : 60* > > *srun: checkpoint_dir : /var/slurm/checkpoint* > > *srun: wait : 0* > > *srun: nice : -2* > > *srun: account : (null)* > > *srun: comment : (null)* > > *srun: dependency : (null)* > > *srun: exclusive : false* > > *srun: bcast : false* > > *srun: qos : (null)* > > *srun: constraints :* > > *srun: geometry : (null)* > > *srun: reboot : yes* > > *srun: rotate : no* > > *srun: preserve_env : false* > > *srun: network : (null)* > > *srun: propagate : NONE* > > *srun: prolog : (null)* > > *srun: epilog : (null)* > > *srun: mail_type : NONE* > > *srun: mail_user : (null)* > > *srun: task_prolog : (null)* > > *srun: task_epilog : (null)* > > *srun: multi_prog : no* > > *srun: sockets-per-node : -2* > > *srun: cores-per-socket : -2* > > *srun: threads-per-core : -2* > > *srun: ntasks-per-node : -2* > > *srun: ntasks-per-socket : -2* > > *srun: ntasks-per-core : -2* > > *srun: plane_size : 4294967294* > > *srun: core-spec : NA* > > *srun: power :* > > *srun: remote command : `hostname'* > > *srun: Waiting for nodes to boot (delay looping 450 times @ 0.100000 secs > x index)* > > *srun: Nodes ####### are ready for job* > > *srun: jobid 50871: nodes(1):`#######', cpu counts: 64(x1)* > > *srun: launching 50871.0 on host #######, 1 tasks: 0* > > *srun: route default plugin loaded* > > *srun: error: timeout waiting for task launch, started 0 of 1 tasks* > > *srun: Job step 50871.0 aborted before step completely launched.* > > *srun: Job step aborted: Waiting up to 32 seconds for job step to finish.* > > *srun: error: Timed out waiting for job step to complete* > > *#* > > > > Rgds > > > > >