Not sure if this is a SLURM or OMPI issue so please bear with the cross-posting...
The OpenMPI FAQ mentions an issue with slurm 2.6.3/pmi2. https://www.open-mpi.org/faq/?category=slurm#slurm-2.6.3-issue I have built both 1.7.5/1.8 against slurm 14.03/pmi2. When I launch openmpi/examples/hello_c on a single node allocation: srun --mpi=pmi2 -N 1 hello_c: srun -N 1 --mpi=pmi2 hello_c srun: error: _server_read: fd 18 got error or unexpected eof reading header srun: error: step_launch_notify_io_failure: aborting, io error with slurmstepd on node 0 srun: Job step aborted: Waiting up to 2 seconds for job step to finish. srun: error: Timed out waiting for job step to complete with --slurmd-debug=9: (I'm not sure what is the meaning of "ip 111.110.61.48 sd 14" below, is that ip as in ip address? It is not the ip address of any Nodes in my partition) slurmstepd: mpi/pmi2: client_resp_send: 26 cmd=kvs-put-response;rc=0; slurmstepd: mpi/pmi2: _tree_listen_readable slurmstepd: mpi/pmi2: _task_readable slurmstepd: mpi/pmi2: got client request: 14 cmd=kvs-fence; slurmstepd: mpi/pmi2: _tree_listen_readable slurmstepd: mpi/pmi2: _task_readable slurmstepd: mpi/pmi2: _tree_listen_read slurmstepd: _tree_listen_read: accepted tree connection: ip 111.110.61.48 sd 14 slurmstepd: _handle_accept_rank: going to read() client rank slurmstepd: _handle_accept_rank: got client rank 1478164480 on fd 14 srun: error: _server_read: fd 18 got error or unexpected eof reading header srun: error: step_launch_notify_io_failure: aborting, io error with slurmstepd on node 0 srun: Job step aborted: Waiting up to 2 seconds for job step to finish. srun: error: Timed out waiting for job step to complete Launching with salloc/sbatch works. - Anthony