As I recall, the problem was that qrsh isn’t available on the backend compute 
nodes, and so we can’t use a tree for launch. If that isn’t true, then we can 
certainly adjust it.

> On Jan 17, 2017, at 9:37 AM, Mark Dixon <m.c.di...@leeds.ac.uk> wrote:
> 
> Hi,
> 
> While commissioning a new cluster, I wanted to run HPL across the whole thing 
> using openmpi 2.0.1.
> 
> I couldn't get it to start on more than 129 hosts under Son of Gridengine 
> (128 remote plus the localhost running the mpirun command). openmpi would sit 
> there, waiting for all the orted's to check in; however, there were "only" a 
> maximum of 128 qrsh processes, therefore a maximum of 128 orted's, therefore 
> waiting a loooong time.
> 
> Increasing plm_rsh_num_concurrent beyond the default of 128 gets the job to 
> launch.
> 
> Is this intentional, please?
> 
> Doesn't openmpi use a tree-like startup sometimes - any particular reason 
> it's not using it here?
> 
> Cheers,
> 
> Mark
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to