Hi,

While commissioning a new cluster, I wanted to run HPL across the whole thing using openmpi 2.0.1.

I couldn't get it to start on more than 129 hosts under Son of Gridengine (128 remote plus the localhost running the mpirun command). openmpi would sit there, waiting for all the orted's to check in; however, there were "only" a maximum of 128 qrsh processes, therefore a maximum of 128 orted's, therefore waiting a loooong time.

Increasing plm_rsh_num_concurrent beyond the default of 128 gets the job to launch.

Is this intentional, please?

Doesn't openmpi use a tree-like startup sometimes - any particular reason it's not using it here?

Cheers,

Mark
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to