That's a big cluster to be starting with rsh! :-)

When you say it won't start, do you mean that it hangs? Or does it fail with 
some error message? How many nodes are involved (this is the important number, 
not the number of cores)?

Also, what version are you using?


On Dec 14, 2010, at 9:10 AM, Lydia Heck wrote:

> 
> About 9 months ago we had a new installation with a system of 1800 cores and 
> at the time we found that jobs with more than 1028 cores would not start. At 
> the time a colleague found that setting
> 
> OMPI_MCA_plm_rsh_num_concurrent=256
> 
> help with the problem.
> 
> We have now increased our processor count to more than 2700 cores and a job 
> with 2,500 jobs does not start.
> 
> Is there any advice?
> 
> Best wishes,
> 
> Lydia Heck
> ------------------------------------------
> Dr E L Heck
> Senior Computer Manager
> 
> University of Durham Institute for Computational Cosmology
> Ogden Centre
> Department of Physics South Road
> 
> DURHAM, DH1 3LE United Kingdom
> 
> e-mail: lydia.h...@durham.ac.uk
> 
> Tel.: + 44 191 - 334 3628
> Fax.: + 44 191 - 334 3645
> ___________________________________________
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to