I have experimented a bit more and found that if I set
OMPI_MCA_plm_rsh_num_concurrent=1024
a job with more than 2,500 processes will start and run.
However when I searched the open-mpi web site for the the variable I could not
find any indication.
Best wishes,
Lydia Heck
15. jobs with more that 2, 500 processes will not even start
(Lydia Heck)
------------------------------
Message: 15
Date: Tue, 14 Dec 2010 16:10:01 +0000 (GMT)
From: Lydia Heck <lydia.h...@durham.ac.uk>
Subject: [OMPI users] jobs with more that 2, 500 processes will not
even start
To: us...@open-mpi.org
Message-ID:
<alpine.lrh.2.00.1012141549220.20...@dubris.phyast.dur.ac.uk>
Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII
About 9 months ago we had a new installation with a system of 1800 cores and at
the time we found that jobs with more than 1028 cores would not start. At the
time a colleague found that setting
OMPI_MCA_plm_rsh_num_concurrent=256
help with the problem.
We have now increased our processor count to more than 2700 cores and a job with
2,500 jobs does not start.
Is there any advice?
Best wishes,
Lydia Heck