errno 24 means "Too many open files".
Looks like you may be hitting the upper limit for the number of open file
descriptors.
Check /proc/sys/fs/file-max and see if you need to bump it up.
Not sure if you need to bump up "ulimit -n", but worth a try.
-Aleph
On 10/14/06, Adam Moody wrote:
Hello
Hello,
I'm trying to run a 500 node job using mpirun / slurm with OpenMPI-1.1.1
and see the following errors at startup:
[rhea342:09444] [0,1,318]-[0,0,0] mca_oob_tcp_peer_recv_blocking: recv()
failed with errno=104
[rhea32:13463] mca_oob_tcp_accept: accept() failed with errno 24.
[rhea32:134