Bonjour Ralph,

I wonder : is this plm_rsh_num_concurrent parameter standing ONLY for rsh use,
or for ssh OR rsh, depending on plm_rsh_agent, please ?

 Thanks,  Best,   G.


Le 14/12/2010 18:30, Ralph Castain a écrit :
That's a big cluster to be starting with rsh! :-)

When you say it won't start, do you mean that it hangs? Or does it fail with 
some error message? How many nodes are involved (this is the important number, 
not the number of cores)?

Also, what version are you using?


On Dec 14, 2010, at 9:10 AM, Lydia Heck wrote:

About 9 months ago we had a new installation with a system of 1800 cores and at 
the time we found that jobs with more than 1028 cores would not start. At the 
time a colleague found that setting

OMPI_MCA_plm_rsh_num_concurrent=256

help with the problem.

We have now increased our processor count to more than 2700 cores and a job 
with 2,500 jobs does not start.

Is there any advice?

Best wishes,

Lydia Heck
------------------------------------------
Dr E L Heck
Senior Computer Manager

University of Durham Institute for Computational Cosmology
Ogden Centre
Department of Physics South Road

DURHAM, DH1 3LE United Kingdom

e-mail: lydia.h...@durham.ac.uk

Tel.: + 44 191 - 334 3628
Fax.: + 44 191 - 334 3645
___________________________________________
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

--
 Cordialement,   Gilbert.

--
*---------------------------------------------------------------------*
  Gilbert Grosdidier             gilbert.grosdid...@in2p3.fr
  LAL / IN2P3 / CNRS                 Phone : +33 1 6446 8909
  Faculté des Sciences, Bat. 200     Fax   : +33 1 6446 8546
  B.P. 34, F-91898 Orsay Cedex (FRANCE)
*---------------------------------------------------------------------*

Reply via email to