I have experimented a bit more and found that if I set

OMPI_MCA_plm_rsh_num_concurrent=1024

a job with more than 2,500 processes will start and run.

However when I searched the open-mpi web site for the the variable I could not find any indication.

Best wishes,
Lydia Heck



 15. jobs with more that 2,     500 processes will not even start
     (Lydia Heck)

------------------------------

Message: 15
Date: Tue, 14 Dec 2010 16:10:01 +0000 (GMT)
From: Lydia Heck <lydia.h...@durham.ac.uk>
Subject: [OMPI users] jobs with more that 2,    500 processes will not
        even start
To: us...@open-mpi.org
Message-ID:
        <alpine.lrh.2.00.1012141549220.20...@dubris.phyast.dur.ac.uk>
Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII


About 9 months ago we had a new installation with a system of 1800 cores and at
the time we found that jobs with more than 1028 cores would not start. At the
time a colleague found that setting

OMPI_MCA_plm_rsh_num_concurrent=256

help with the problem.

We have now increased our processor count to more than 2700 cores and a job with
2,500 jobs does not start.

Is there any advice?

Best wishes,

Lydia Heck

Reply via email to