Hello, I am setting up a very simple configuration: one node running slurmd and another one running slurmctld.
In the slurmctld machine I run: srun -v -p debug bash -i And get this output srun: defined options srun: -------------------- -------------------- srun: partition : debug srun: verbose : 1 srun: -------------------- -------------------- srun: end of defined options srun: jobid 41: nodes(1):`test116', cpu counts: 1(x1) srun: CpuBindType=(null type) srun: launching 41.0 on host test116, 1 tasks: 0 srun: route default plugin loaded srun: error: task 0 launch failed: Slurmd could not set up environment for batch job srun: Node test116, 1 tasks started Enabled debug logging in slurmd. slurmd: debug3: in the service_connection slurmd: debug2: Start processing RPC: REQUEST_LAUNCH_TASKS slurmd: debug2: Processing RPC: REQUEST_LAUNCH_TASKS slurmd: launch task 45.0 request from UID:1000 GID:1000 HOST:169.254.1.32 PORT:2300 slurmd: debug3: state for jobid 42: ctime:1581056522 revoked:1581056522 expires:1581056642 slurmd: debug3: state for jobid 43: ctime:1581056533 revoked:1581056533 expires:1581056653 slurmd: debug3: state for jobid 44: ctime:1581056623 revoked:1581056623 expires:1581056743 slurmd: debug: Checking credential with 384 bytes of sig data slurmd: debug: task affinity : before lllp distribution cpu bind method is '(null type)' ((null)) slurmd: debug3: task/affinity: slurmctld s 1 c 1; hw s 1 c 1 t 1 slurmd: debug3: task/affinity: job 45.0 core mask from slurmctld: 0x1 slurmd: debug3: task/affinity: job 45.0 CPU final mask for local node: 0x00000000000000000001 slurmd: debug3: _lllp_map_abstract_masks slurmd: debug: binding tasks:1 to nodes:1 sockets:1:0 cores:1:0 threads:1 slurmd: lllp_distribution jobid [45] implicit auto binding: sockets,one_thread, dist 8192 slurmd: _task_layout_lllp_cyclic slurmd: debug3: task/affinity: slurmctld s 1 c 1; hw s 1 c 1 t 1 slurmd: debug3: task/affinity: job 45.0 core mask from slurmctld: 0x1 slurmd: debug3: task/affinity: job 45.0 CPU final mask for local node: 0x00000000000000000001 slurmd: debug3: _task_layout_display_masks jobid [45:0] 0x00000000000000000001 slurmd: debug3: _lllp_map_abstract_masks slurmd: debug3: _task_layout_display_masks jobid [45:0] 0x00000000000000000001 slurmd: debug3: _lllp_generate_cpu_bind 1 23 24 slurmd: _lllp_generate_cpu_bind jobid [45]: mask_cpu,one_thread, 0x00000000000000000001 slurmd: debug: task affinity : after lllp distribution cpu bind method is 'mask_cpu,one_thread' (0x00000000000000000001) slurmd: debug2: _insert_job_state: we already have a job state for job 45. No big deal, just an FYI. slurmd: _run_prolog: run job script took usec=4 slurmd: _run_prolog: prolog with lock for job 45 ran for 0 seconds slurmd: debug3: _rpc_launch_tasks: call to _forkexec_slurmstepd slurmd: debug3: slurmstepd rank 0 (test116), parent rank -1 (NONE), children 0, depth 0, max_depth 0 slurmd: debug3: _rpc_launch_tasks: return from _forkexec_slurmstepd slurmd: debug: task_p_slurmd_reserve_resources: 45 slurmd: debug2: Finish processing RPC: REQUEST_LAUNCH_TASKS slurmd: debug3: in the service_connection slurmd: debug2: Start processing RPC: REQUEST_TERMINATE_JOB slurmd: debug2: Processing RPC: REQUEST_TERMINATE_JOB slurmd: debug: _rpc_terminate_job, uid = 1000 slurmd: debug: task_p_slurmd_release_resources: affinity jobid 45 slurmd: debug: credential for job 45 revoked slurmd: debug2: No steps in jobid 45 to send signal 18 slurmd: debug2: No steps in jobid 45 to send signal 15 slurmd: debug4: sent ALREADY_COMPLETE slurmd: debug2: set revoke expiration for jobid 45 to 1581056754 UTS slurmd: debug2: Finish processing RPC: REQUEST_TERMINATE_JOB Any ideas what could be going wrong here? Thanks -- -h